Shares Predictions with Advanced Time-series Models¶
For this challenge, we have a dataset made of 4 time-series with weekly frequency: Share_value, Share_volume, Price and DP. The objective is to analyze and forecast the Share_value and Share_volume for 8 brands on a weekly basis for the next year (52 weeks), and this prediction will be evaluated with the MAE.
For doing it, we will perform an in-depth analysis with an exploratory analysis of the data to understand it, test several advanced forecasting models and select the best one for doing the final predictions of 52 weeks.
Libraries¶
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import statsmodels.api as sm
import pmdarima as pm
import xgboost as xgb
from pylab import rcParams
from statsmodels.tsa.stattools import adfuller
from statsmodels.graphics.tsaplots import plot_acf, plot_pacf
from statsmodels.tsa.exponential_smoothing.ets import ETSModel
from statsmodels.tsa.api import VAR
from sklearn.metrics import mean_absolute_error
from sklearn.decomposition import PCA
from sklearn.ensemble import RandomForestRegressor
Load dataset¶
Let's load the data and check its format:
data = pd.read_csv('competition_training.csv')
data.head()
| Brand | Date_Monday | Share_value | Share_volume | Price | DP | |
|---|---|---|---|---|---|---|
| 0 | Brand1 | 2021-12-12 | 18.982 | 16.945 | 17.531 | 87.082 |
| 1 | Brand1 | 2021-12-19 | 19.421 | 17.927 | 17.386 | 88.364 |
| 2 | Brand1 | 2021-12-26 | 17.302 | 16.728 | 17.426 | 91.269 |
| 3 | Brand1 | 2022-01-02 | 15.881 | 15.174 | 17.439 | 92.315 |
| 4 | Brand1 | 2022-01-09 | 16.775 | 14.653 | 17.772 | 83.587 |
Shape:
data.shape
(896, 6)
Dates:
data['Date_Monday'].unique()
array(['2021-12-12', '2021-12-19', '2021-12-26', '2022-01-02',
'2022-01-09', '2022-01-16', '2022-01-23', '2022-01-30',
'2022-02-06', '2022-02-13', '2022-02-20', '2022-02-27',
'2022-03-06', '2022-03-13', '2022-03-20', '2022-03-27',
'2022-04-03', '2022-04-10', '2022-04-17', '2022-04-24',
'2022-05-01', '2022-05-08', '2022-05-15', '2022-05-22',
'2022-05-29', '2022-06-05', '2022-06-12', '2022-06-19',
'2022-06-26', '2022-07-03', '2022-07-10', '2022-07-17',
'2022-07-24', '2022-07-31', '2022-08-07', '2022-08-14',
'2022-08-21', '2022-08-28', '2022-09-04', '2022-09-11',
'2022-09-18', '2022-09-25', '2022-10-02', '2022-10-09',
'2022-10-16', '2022-10-23', '2022-10-30', '2022-11-06',
'2022-11-13', '2022-11-20', '2022-11-27', '2022-12-04',
'2022-12-11', '2022-12-18', '2022-12-25', '2023-01-01',
'2023-01-08', '2023-01-15', '2023-01-22', '2023-01-29',
'2023-02-05', '2023-02-12', '2023-02-19', '2023-02-26',
'2023-03-05', '2023-03-12', '2023-03-19', '2023-03-26',
'2023-04-02', '2023-04-09', '2023-04-16', '2023-04-23',
'2023-04-30', '2023-05-07', '2023-05-14', '2023-05-21',
'2023-05-28', '2023-06-04', '2023-06-11', '2023-06-18',
'2023-06-25', '2023-07-02', '2023-07-09', '2023-07-16',
'2023-07-23', '2023-07-30', '2023-08-06', '2023-08-13',
'2023-08-20', '2023-08-27', '2023-09-03', '2023-09-10',
'2023-09-17', '2023-09-24', '2023-10-01', '2023-10-08',
'2023-10-15', '2023-10-22', '2023-10-29', '2023-11-05',
'2023-11-12', '2023-11-19', '2023-11-26', '2023-12-03',
'2023-12-10', '2023-12-17', '2023-12-24', '2023-12-31',
'2024-01-07', '2024-01-14', '2024-01-21', '2024-01-28'],
dtype=object)
We are going to merge all the brands together so each brand has its column, so the index can be the weekly date. We will use the name Date_Sunday as the dates are really sundays, not mondays (although this is not relevant and in the final forecast we will use the name Date_Monday).
# Convert 'Date_Monday' to datetime format and set it as the index
data = data.rename(columns={'Date_Monday': 'Date_Sunday'})
data["Date_Sunday"] = pd.to_datetime(data["Date_Sunday"])
data.set_index("Date_Sunday", inplace=True)
# Pivot the table to get each Brand's data as separate columns
df = data.pivot_table(index="Date_Sunday", columns="Brand")
# Flatten the multi-index columns and rename them properly
df.columns = [f"{col[0]}_{col[1].replace('Brand', '')}" for col in df.columns]
#Â Sort
df = df.sort_index()
df.shape
(112, 32)
So the columns are:
df.columns
Index(['DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
'Price_7', 'Price_8', 'Share_value_1', 'Share_value_2', 'Share_value_3',
'Share_value_4', 'Share_value_5', 'Share_value_6', 'Share_value_7',
'Share_value_8', 'Share_volume_1', 'Share_volume_2', 'Share_volume_3',
'Share_volume_4', 'Share_volume_5', 'Share_volume_6', 'Share_volume_7',
'Share_volume_8'],
dtype='object')
Now we have the predictors DP and Price and the targets Share_value and Share_volume in columns. Let's add the week number, month and year as features.
# Create Week number and Year featuers
df['Week_number'] = df.index.to_series().apply(lambda x: int(x.strftime("%W")))
df['Month'] = df.index.month
df['Year'] = df.index.year
df = df.asfreq('W-SUN')
We have 112 weeks of data, of which the distribution of years is the following:
df['Year'].value_counts()
Year 2023 53 2022 52 2024 4 2021 3 Name: count, dtype: int64
The dataset has slightly more than 2 years of data. As we have to forecast a whole year, this will be a challenge, so our work will take that into account.
Exploratory Data Analysis¶
In this section we are going to analyze the data so we can understand it before applying the forecasting models.
Seasonality¶
As our data spans a bit more than 2 years, we can check if it has seasonality (using, for example, Share_volume_1):
rcParams['figure.figsize'] = 6, 4
decomposed_wind = sm.tsa.seasonal_decompose(df["Share_volume_1"], period=52)
decomposed_wind.plot()
plt.show()
With seasonal_decompose it is not very clear if there is seasonality. Let's plot the data for one feature for the available 4 years to see it better:
df_2021 = df[df['Year'] == 2021]
df_2022 = df[df['Year'] == 2022]
df_2023 = df[df['Year'] == 2023]
df_2024 = df[df['Year'] == 2024]
# Plot a feature for a brand
feature = 'DP'
brand = "1"
plt.figure(figsize=(12, 4))
plt.plot(df_2021['Week_number'], df_2021[f"{feature}_{brand}"], label="2021")
plt.plot(df_2022['Week_number'], df_2022[f"{feature}_{brand}"], label="2022")
plt.plot(df_2023['Week_number'], df_2023[f"{feature}_{brand}"], label="2023")
plt.plot(df_2024['Week_number'], df_2024[f"{feature}_{brand}"], label="2024")
plt.xlabel("Date")
plt.ylabel(feature)
plt.legend()
plt.show()
We can see how there are clear signs of seasonality for all the targets and predictors (and for all brands). We will take advantage of this in the models.
Correlations¶
Now, how are the different features correlated? Let's plot their correlations in a heatmap for the 32 features:
# Create the heatmap
plt.figure(figsize=(18, 12))
sns.heatmap(df.corr(), annot=True, cmap='coolwarm', fmt='.1f', linewidths=0.5)
# Display the plot
plt.title("Correlation Heatmap")
plt.show()
Although there are many features, we can take some key insights:
- The share value of a brand is highly correlated with the share volume. This means that, with forecasting one of them, the other can be easily predicted. During the models, we will use this.
- There is high correlation in the DP and Price features.
- The targets for a brand are not necessarily correlated only with the predictors of that brand, so an individual analysis per brand is probably not the best idea.
Stationarity¶
Are the series stationary? Let's check it with adfuller:
stationarity_results = []
for col in df.columns:
adf_result = adfuller(df[col])
p_value = adf_result[1]
if p_value < 0.05:
stationarity_results.append([col, p_value, 'Stationary'])
else:
stationarity_results.append([col, p_value, 'Non-Stationary'])
stationarity_df = pd.DataFrame(stationarity_results, columns=['Series', 'p-value', 'Stationarity'])
print(stationarity_df)
Series p-value Stationarity 0 DP_1 0.008691 Stationary 1 DP_2 0.000012 Stationary 2 DP_3 0.016800 Stationary 3 DP_4 0.000172 Stationary 4 DP_5 0.000222 Stationary 5 DP_6 0.014492 Stationary 6 DP_7 0.004929 Stationary 7 DP_8 0.001862 Stationary 8 Price_1 0.079085 Non-Stationary 9 Price_2 0.403474 Non-Stationary 10 Price_3 0.475552 Non-Stationary 11 Price_4 0.000292 Stationary 12 Price_5 0.482347 Non-Stationary 13 Price_6 0.222301 Non-Stationary 14 Price_7 0.299075 Non-Stationary 15 Price_8 0.146947 Non-Stationary 16 Share_value_1 0.000001 Stationary 17 Share_value_2 0.000003 Stationary 18 Share_value_3 0.000001 Stationary 19 Share_value_4 0.001027 Stationary 20 Share_value_5 0.000562 Stationary 21 Share_value_6 0.000318 Stationary 22 Share_value_7 0.001857 Stationary 23 Share_value_8 0.000263 Stationary 24 Share_volume_1 0.000252 Stationary 25 Share_volume_2 0.000338 Stationary 26 Share_volume_3 0.001991 Stationary 27 Share_volume_4 0.002571 Stationary 28 Share_volume_5 0.000459 Stationary 29 Share_volume_6 0.000035 Stationary 30 Share_volume_7 0.019241 Stationary 31 Share_volume_8 0.000661 Stationary 32 Week_number 0.041814 Stationary 33 Month 0.059621 Non-Stationary 34 Year 0.542540 Non-Stationary
7 of the Price variables are non-stationary (apart from the trivial Year), but not any target or DP predictor. However, we will let the models that need stationarity in the exogenous variables handle it naturally, as most of the models that we are going to use do not need to stationarize this predictors.
Autocorrelations¶
In the following plots, we can see the ACF and PACF plots for each brand for the 4 features. However, we will not select manually the ARIMA parameters using these plots, as it is costly and we want to look for the best model using auto_arima.
DP¶
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
# Loop over the brands and plot ACF for each feature in a 4x2 grid
for i in range(1, 9):
# Plot
ax = axes[(i-1)//2, (i-1)%2]
plot_acf(df[f'DP_{i}'], ax=ax, lags=40)
ax.set_title(f'ACF of DP_{i}')
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
for i in range(1, 9):
ax = axes[(i-1)//2, (i-1)%2]
plot_pacf(df[f'DP_{i}'], ax=ax, lags=40)
ax.set_title(f'PACF of DP_{i}')
plt.tight_layout()
plt.show()
Price¶
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
for i in range(1, 9):
ax = axes[(i-1)//2, (i-1)%2]
plot_acf(df[f'Price_{i}'], ax=ax, lags=40)
ax.set_title(f'ACF of Price_{i}')
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
for i in range(1, 9):
ax = axes[(i-1)//2, (i-1)%2]
plot_pacf(df[f'Price_{i}'], ax=ax, lags=40)
ax.set_title(f'PACF of Price_{i}')
plt.tight_layout()
plt.show()
Share Value¶
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
for i in range(1, 9):
ax = axes[(i-1)//2, (i-1)%2]
plot_acf(df[f'Share_value_{i}'], ax=ax, lags=40)
ax.set_title(f'ACF of Share_value_{i}')
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
for i in range(1, 9):
ax = axes[(i-1)//2, (i-1)%2]
plot_pacf(df[f'Share_value_{i}'], ax=ax, lags=40)
ax.set_title(f'PACF of Share_value_{i}')
plt.tight_layout()
plt.show()
Share Volume¶
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
for i in range(1, 9):
ax = axes[(i-1)//2, (i-1)%2]
plot_acf(df[f'Share_volume_{i}'], ax=ax, lags=40)
ax.set_title(f'ACF of Share_volume_{i}')
plt.tight_layout()
plt.show()
fig, axes = plt.subplots(4, 2, figsize=(12, 8))
for i in range(1, 9):
ax = axes[(i-1)//2, (i-1)%2]
plot_pacf(df[f'Share_volume_{i}'], ax=ax, lags=40)
ax.set_title(f'PACF of Share_volume_{i}')
plt.tight_layout()
plt.show()
Feature Engineering¶
For the last part of the exploratory data analysis, we are going to add several features that can be helpful for the following models. These are the current predictors and targets:
df.columns
Index(['DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
'Price_7', 'Price_8', 'Share_value_1', 'Share_value_2', 'Share_value_3',
'Share_value_4', 'Share_value_5', 'Share_value_6', 'Share_value_7',
'Share_value_8', 'Share_volume_1', 'Share_volume_2', 'Share_volume_3',
'Share_volume_4', 'Share_volume_5', 'Share_volume_6', 'Share_volume_7',
'Share_volume_8', 'Week_number', 'Month', 'Year'],
dtype='object')
There is a clear sign of seasonality, so we are going to exploit it. As our final objective is to forecast 52 weeks, we are not going to depend on small lags (like 1 or 5), as the error can get very big with each forecasted week. Instead, we are going to use the value from last year, so we can do more solid long-term forecasting.
# Create lag features for Share_value and Share_volume (1 year ago)
for i in range(1, 9): # Loop over the 8 brands
df[f'Share_value_{i}_ly'] = df[f'Share_value_{i}'].shift(52)
df[f'Share_volume_{i}_ly'] = df[f'Share_volume_{i}'].shift(52)
df.columns
Index(['DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
'Price_7', 'Price_8', 'Share_value_1', 'Share_value_2', 'Share_value_3',
'Share_value_4', 'Share_value_5', 'Share_value_6', 'Share_value_7',
'Share_value_8', 'Share_volume_1', 'Share_volume_2', 'Share_volume_3',
'Share_volume_4', 'Share_volume_5', 'Share_volume_6', 'Share_volume_7',
'Share_volume_8', 'Week_number', 'Month', 'Year', 'Share_value_1_ly',
'Share_volume_1_ly', 'Share_value_2_ly', 'Share_volume_2_ly',
'Share_value_3_ly', 'Share_volume_3_ly', 'Share_value_4_ly',
'Share_volume_4_ly', 'Share_value_5_ly', 'Share_volume_5_ly',
'Share_value_6_ly', 'Share_volume_6_ly', 'Share_value_7_ly',
'Share_volume_7_ly', 'Share_value_8_ly', 'Share_volume_8_ly'],
dtype='object')
Performance Evaluation¶
Before starting developing the models, we need a way to compare between them. As we have 112 weeks of data and we want to exploit the seasonality, we need at least two full years for our training set (this is, 104 weeks). This lets us with just 8 weeks for the test set. However, as we said, we are not using small lags but seasonality, so we do not need a test set similar to 52 weeks to do a good assesment of the performance. If we used small lags, errors for 8 forecasted weeks could be small but they could increase a lot when forecasting 52 weeks. However, our models will take profit of the seasonality, so 8 weeks, although small, is a valid evaluator for the performance.
targets = ['Share_value', 'Share_volume']
results_df = pd.DataFrame()
L = 104 # Lenght of training set (2 years)
train = df.iloc[:L] # Rows 1 to 60 for training
test = df.iloc[L:112] # Rows 61 to 112 for testing
print(len(train), len(test))
104 8
We are going to drop the predictors from the test set, as we will not have the data in the real case.
dp_price_columns = [f'DP_{i}' for i in range(1, 9)] + [f'Price_{i}' for i in range(1, 9)]
dp_price_true_df = test[[f'DP_{i}' for i in range(1, 9)] + [f'Price_{i}' for i in range(1, 9)]]
test = test.drop(columns=dp_price_columns, errors='ignore')
test.columns
Index(['Share_value_1', 'Share_value_2', 'Share_value_3', 'Share_value_4',
'Share_value_5', 'Share_value_6', 'Share_value_7', 'Share_value_8',
'Share_volume_1', 'Share_volume_2', 'Share_volume_3', 'Share_volume_4',
'Share_volume_5', 'Share_volume_6', 'Share_volume_7', 'Share_volume_8',
'Week_number', 'Month', 'Year', 'Share_value_1_ly', 'Share_volume_1_ly',
'Share_value_2_ly', 'Share_volume_2_ly', 'Share_value_3_ly',
'Share_volume_3_ly', 'Share_value_4_ly', 'Share_volume_4_ly',
'Share_value_5_ly', 'Share_volume_5_ly', 'Share_value_6_ly',
'Share_volume_6_ly', 'Share_value_7_ly', 'Share_volume_7_ly',
'Share_value_8_ly', 'Share_volume_8_ly'],
dtype='object')
Here we can see an example of the split done:
feature = 'Share_value'
brand = "1"
plt.figure(figsize=(12, 4))
plt.plot(train.index, train[f"{feature}_{brand}"], label="Train")
plt.plot(test.index, test[f"{feature}_{brand}"], label="Test")
plt.xlabel("Date")
plt.ylabel(feature)
plt.legend()
plt.show()
Naive Models¶
The first models that we are going to develop are the "naive" models. They will serve as a baseline for comparing with the rest of them.
Naive Median¶
The first model will use the median for the forecast (as it minimizes the MAE, the performance metric). Here we can see an example:
feature = 'Share_value'
brand = "1"
median_value = train[f"{feature}_{brand}"].median()
forecast = [median_value] * len(test)
plt.figure(figsize=(12, 4))
plt.plot(train.index, train[f"{feature}_{brand}"], label="Train")
plt.plot(test.index, test[f"{feature}_{brand}"], label="Test")
plt.plot(test.index, forecast, label="Forecast")
plt.xlabel("Date")
plt.ylabel(f"{feature}_{brand}")
plt.legend()
plt.show()
And here we compute it for all targets and calculate and store the resulting MAE:
maes = []
for feature in targets:
for brand in range(1, 8+1):
median_value = train[f"{feature}_{brand}"].median()
forecast = [median_value] * len(test)
maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
mae = sum(maes) / len(maes)
print(f'MAE = {mae}')
results_df.loc['Naive Median', 'MAE'] = mae
MAE = 1.5345234374999999
Naive Seasonal¶
This naive model will use the value from the last year of each target for the forecast:
for feature in targets:
for brand in range(1, 8+1):
# Use the value from the same week last year
forecast = train[f"{feature}_{brand}"].shift(52).iloc[-len(test):].values
# Compute MAE
maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
mae = sum(maes) / len(maes)
print(f'MAE = {mae}')
results_df.loc['Naive Seasonal', 'MAE'] = mae
MAE = 1.7515273437499999
Now, our objective is to get a lower MAE from our forecasting models.
Automatic Models¶
These are the "simplest" models that we are going to use, as they do not need any predictors for the target (just the past target values). Also, they allow us to visualize their confidence intervals, which is very useful.
ETS¶
As we said, we want to take advantage of the seasonality, so ETS is promising for our data. For each model, we will first do a test for one target so we can see how it works, Then, we will do the forecast for all 16 targets as also visualize the resulting forecast, comparing them to the real test values.
Testing for one target¶
feature = 'Share_value'
brand = "1"
model_ets = ETSModel(
train[f"{feature}_{brand}"],
trend='add',
seasonal='add',
seasonal_periods=52
).fit()
pred = model_ets.get_prediction(start = test.index[0], end = test.index[-1])
cis = pred.pred_int(alpha = .05) #confidence interval
lower_ci = cis['lower PI (alpha=0.050000)']
upper_ci = cis['upper PI (alpha=0.050000)']
forecast = model_ets.forecast(steps=len(test))
plt.figure(figsize=(12, 4))
plt.plot(train.index, train[f"{feature}_{brand}"], label="Train")
plt.plot(test.index, test[f"{feature}_{brand}"], label="Test")
plt.plot(test.index, forecast, label="Forecast")
plt.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
plt.xlabel("Date")
plt.ylabel(f"{feature}_{brand}")
plt.legend()
plt.grid()
plt.show()
Forecasting all targets¶
maes = []
maes_ets = {}
forecasts_value = {}
forecasts_value_lci = {}
forecasts_value_uci = {}
forecasts_volume = {}
forecasts_volume_lci = {}
forecasts_volume_uci = {}
forecasts_ets = {}
for feature in targets:
for brand in range(1, 8+1):
model_ets = ETSModel(
train[f"{feature}_{brand}"],
trend='add',
seasonal='add',
seasonal_periods=52
).fit()
pred = model_ets.get_prediction(start = test.index[0], end = test.index[-1])
cis = pred.pred_int(alpha = .05) #confidence interval
lower_ci = cis['lower PI (alpha=0.050000)']
upper_ci = cis['upper PI (alpha=0.050000)']
forecast = model_ets.forecast(steps=len(test))
forecasts_ets[f"{feature}_{brand}"] = forecast
if feature == 'Share_value':
forecasts_value[f"{feature}_{brand}"] = forecast
forecasts_value_lci[f"{feature}_{brand}"] = lower_ci
forecasts_value_uci[f"{feature}_{brand}"] = upper_ci
else:
forecasts_volume[f"{feature}_{brand}"] = forecast
forecasts_volume_lci[f"{feature}_{brand}"] = lower_ci
forecasts_volume_uci[f"{feature}_{brand}"] = upper_ci
maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
maes_ets[f"{feature}_{brand}"] = mean_absolute_error(test[f"{feature}_{brand}"], forecast)
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['ETS', 'MAE'] = mae
MAE = 0.709
Predictions for Share_value:
# Create a 4x2 subplot for Share_value forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten() # Flatten the array of axes for easier indexing
# Loop through each brand and plot the forecasts
for brand in range(1, 9): # Brands are 1 to 8
# Define the feature and the key for the forecast
feature = "Share_value"
forecast_key = f"{feature}_{brand}"
# Get the forecast for the current brand
forecast = forecasts_value[forecast_key]
lower_ci = forecasts_value_lci[forecast_key]
upper_ci = forecasts_value_uci[forecast_key]
# Get the corresponding train and test data
train_data = train[f"{feature}_{brand}"]
test_data = test[f"{feature}_{brand}"]
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train, test, and forecast values
ax.plot(train_data.index, train_data, label="Train", color='blue')
ax.plot(test_data.index, test_data, label="Test", color='green')
ax.plot(test_data.index, forecast, label="Forecast", color='red')
ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
# Set labels and title
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
#ax.set_title(f"Brand {brand}")
ax.legend()
# Set x-axis limits
ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))
# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
Predictions for Share_volume:
# Create a 4x2 subplot for Share_volume forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten() # Flatten the array of axes for easier indexing
# Loop through each brand and plot the forecasts
for brand in range(1, 9): # Brands are 1 to 8
# Define the feature and the key for the forecast
feature = "Share_volume"
forecast_key = f"{feature}_{brand}"
# Get the forecast for the current brand
forecast = forecasts_volume[forecast_key]
lower_ci = forecasts_volume_lci[forecast_key]
upper_ci = forecasts_volume_uci[forecast_key]
# Get the corresponding train and test data
train_data = train[f"{feature}_{brand}"]
test_data = test[f"{feature}_{brand}"]
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train, test, and forecast values
ax.plot(train_data.index, train_data, label="Train", color='blue')
ax.plot(test_data.index, test_data, label="Test", color='green')
ax.plot(test_data.index, forecast, label="Forecast", color='red')
ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
# Set labels and title
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
#ax.set_title(f"Brand {brand}")
ax.legend()
# Set x-axis limits
ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))
# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
We got a MAE of approx. 0.7, which is a great improvement from the naive models. This is a very simple model that only uses the past target values, and we can see how its forecast work very well for many targets (and also shows the confidence intervals). Remember that ETS has an important season component, so although our test size is small compared to the final forecast, its performance probably will not decay significantly with the 52 weeks.
SARIMA¶
Apart from ETS, we are also going to use an ARIMA model. As we said, we want to use seasonality so we will use SARIMA. For selecting the parameters, we will run auto_arima for each of the targets so we can select the best for them, because targets natures seem to be different between them.
Testing for one target¶
feature = 'Share_value'
brand = "1"
model_arima = pm.auto_arima(
train[f"{feature}_{brand}"],
seasonal=True,
m=52,
D=0,
trace=True,
error_action='ignore',
suppress_warnings=True,
stepwise=True
)
Performing stepwise search to minimize aic ARIMA(2,0,2)(1,0,1)[52] intercept : AIC=321.856, Time=5.13 sec ARIMA(0,0,0)(0,0,0)[52] intercept : AIC=353.292, Time=0.01 sec ARIMA(1,0,0)(1,0,0)[52] intercept : AIC=317.351, Time=1.84 sec ARIMA(0,0,1)(0,0,1)[52] intercept : AIC=321.811, Time=0.62 sec ARIMA(0,0,0)(0,0,0)[52] : AIC=869.689, Time=0.01 sec ARIMA(1,0,0)(0,0,0)[52] intercept : AIC=315.421, Time=0.04 sec ARIMA(1,0,0)(0,0,1)[52] intercept : AIC=317.361, Time=2.91 sec ARIMA(1,0,0)(1,0,1)[52] intercept : AIC=inf, Time=2.80 sec ARIMA(2,0,0)(0,0,0)[52] intercept : AIC=314.883, Time=0.05 sec ARIMA(2,0,0)(1,0,0)[52] intercept : AIC=315.948, Time=3.28 sec ARIMA(2,0,0)(0,0,1)[52] intercept : AIC=316.025, Time=1.86 sec ARIMA(2,0,0)(1,0,1)[52] intercept : AIC=317.970, Time=1.60 sec ARIMA(3,0,0)(0,0,0)[52] intercept : AIC=316.883, Time=0.04 sec ARIMA(2,0,1)(0,0,0)[52] intercept : AIC=316.884, Time=0.05 sec ARIMA(1,0,1)(0,0,0)[52] intercept : AIC=315.176, Time=0.03 sec ARIMA(3,0,1)(0,0,0)[52] intercept : AIC=318.881, Time=0.04 sec ARIMA(2,0,0)(0,0,0)[52] : AIC=inf, Time=0.01 sec Best model: ARIMA(2,0,0)(0,0,0)[52] intercept Total fit time: 20.337 seconds
forecast, conf_int = model_arima.predict(8, return_conf_int=True, alpha=0.05)
lower_ci = conf_int[:,0]
upper_ci = conf_int[:,1]
plt.figure(figsize=(12, 4))
plt.plot(train.index, train[f"{feature}_{brand}"], label="Train")
plt.plot(test.index, test[f"{feature}_{brand}"], label="Test")
plt.plot(test.index, forecast, label="Forecast")
plt.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
plt.xlabel("Date")
plt.ylabel(f"{feature}_{brand}")
plt.legend()
plt.grid()
plt.show()
Forecasting all targets¶
maes = []
forecasts_value = {}
forecasts_value_lci = {}
forecasts_value_uci = {}
forecasts_volume = {}
forecasts_volume_lci = {}
forecasts_volume_uci = {}
for feature in targets:
for brand in range(1, 8+1):
model_arima = pm.auto_arima(
train[f"{feature}_{brand}"],
seasonal=True,
m=52,
D=0,
trace=False,
error_action='ignore',
suppress_warnings=True,
stepwise=True
)
forecast, conf_int = model_arima.predict(8, return_conf_int=True, alpha=0.05)
lower_ci = conf_int[:,0]
upper_ci = conf_int[:,1]
if feature == 'Share_value':
forecasts_value[f"{feature}_{brand}"] = forecast
forecasts_value_lci[f"{feature}_{brand}"] = lower_ci
forecasts_value_uci[f"{feature}_{brand}"] = upper_ci
else:
forecasts_volume[f"{feature}_{brand}"] = forecast
forecasts_volume_lci[f"{feature}_{brand}"] = lower_ci
forecasts_volume_uci[f"{feature}_{brand}"] = upper_ci
maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['SARIMA', 'MAE'] = mae
MAE = 1.085
# Create a 4x2 subplot for Share_value forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten() # Flatten the array of axes for easier indexing
# Loop through each brand and plot the forecasts
for brand in range(1, 9): # Brands are 1 to 8
# Define the feature and the key for the forecast
feature = "Share_value"
forecast_key = f"{feature}_{brand}"
# Get the forecast for the current brand
forecast = forecasts_value[forecast_key]
lower_ci = forecasts_value_lci[forecast_key]
upper_ci = forecasts_value_uci[forecast_key]
# Get the corresponding train and test data
train_data = train[f"{feature}_{brand}"]
test_data = test[f"{feature}_{brand}"]
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train, test, and forecast values
ax.plot(train_data.index, train_data, label="Train", color='blue')
ax.plot(test_data.index, test_data, label="Test", color='green')
ax.plot(test_data.index, forecast, label="Forecast", color='red')
ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
# Set labels and title
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
#ax.set_title(f"Brand {brand}")
ax.legend()
# Set x-axis limits
ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))
# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
# Create a 4x2 subplot for Share_volume forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten() # Flatten the array of axes for easier indexing
# Loop through each brand and plot the forecasts
for brand in range(1, 9): # Brands are 1 to 8
# Define the feature and the key for the forecast
feature = "Share_volume"
forecast_key = f"{feature}_{brand}"
# Get the forecast for the current brand
forecast = forecasts_volume[forecast_key]
lower_ci = forecasts_volume_lci[forecast_key]
upper_ci = forecasts_volume_uci[forecast_key]
# Get the corresponding train and test data
train_data = train[f"{feature}_{brand}"]
test_data = test[f"{feature}_{brand}"]
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train, test, and forecast values
ax.plot(train_data.index, train_data, label="Train", color='blue')
ax.plot(test_data.index, test_data, label="Test", color='green')
ax.plot(test_data.index, forecast, label="Forecast", color='red')
ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
# Set labels and title
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
#ax.set_title(f"Brand {brand}")
ax.legend()
# Set x-axis limits
ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))
# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
Although SARIMA works well for some targets, it is not capable of predicting very well others, so we can clearly see that ETS is the best automatic model. SARIMA also needed much more computation time.
Adding Exogenous Variables¶
We have used models that only use the past values for each target. However, in our dataset we have the DP and Price predictors, which show correlation with the targets and can improve the results.
Predictors analysis¶
Let's first do an analysis of the predictors.
Correlations¶
Let's do a quick and final analysis of the correlations between the features:
Correlation with targets¶
# Create the heatmap
plt.figure(figsize=(18, 12))
sns.heatmap(train.corr(), annot=False, cmap='coolwarm', fmt='.1f', linewidths=0.5)
# Display the plot
plt.title("Correlation Heatmap")
plt.show()
Although this is a quite difficult to understand heatmap, we can extract a key insight: our predictors (DP, Price, values from last year and time features) show high correlations with many targets, and this correlations are usually between brands, so we need to find which predictors are significant for each one. We will do this in this section with the dynamic regression model.
Lagged Variables¶
As a final check, let's see if there is high correlation for the lags of the predictors. For this, we will compute the lags 1, 2 and 4, and plot them in a 4 different heatmaps (compared to the lag 0 values). Our objective is to see if some values of the correlations increase with the lags.
lags = [1, 2, 4]
df_lag = df.copy()
for lag in lags:
for i in range(1, 9): # 8 brands
df_lag[f'DP_{i}_lag{lag}'] = df_lag[f'DP_{i}'].shift(lag)
df_lag[f'Price_{i}_lag{lag}'] = df_lag[f'Price_{i}'].shift(lag)
# Lag 0
plt.figure(figsize=(10, 6))
# Select columns for the current lag
cols = [f'DP_{i}' for i in range(1, 9)] + \
[f'Price_{i}' for i in range(1, 9)] + \
[f'Share_value_{i}' for i in range(1, 9)] + \
[f'Share_volume_{i}' for i in range(1, 9)]
df_subset = df_lag[cols]
# Compute correlation
corr_matrix = df_subset.corr()
# Plot heatmap
sns.heatmap(corr_matrix, annot=False, fmt=".1f", cmap="coolwarm", linewidths=0.5)
plt.title(f'Correlation Heatmap for Lag 0')
plt.show()
# Plot correlation heatmap for each lag
for lag in lags:
plt.figure(figsize=(10, 6))
# Select columns for the current lag
cols = [f'DP_{i}_lag{lag}' for i in range(1, 9)] + \
[f'Price_{i}_lag{lag}' for i in range(1, 9)] + \
[f'Share_value_{i}' for i in range(1, 9)] + \
[f'Share_volume_{i}' for i in range(1, 9)]
df_subset = df_lag[cols]
# Compute correlation
corr_matrix = df_subset.corr()
# Plot heatmap
sns.heatmap(corr_matrix, annot=False, fmt=".1f", cmap="coolwarm", linewidths=0.5)
plt.title(f'Correlation Heatmap for Lag {lag}')
plt.show()
As we can see, the bigger the lags the less important is the correlation. This is not relevant for us because, as we said, we are taking profit of the seasonality and not these small lags, so our seasonal models are probably the best options.
Forecasting Predictors¶
As we are going to use the predictors for forecasting the targets, we need to forecast these. The most effective automatic method was ETS, so we are going to use it to forecast the DP and Price features. These means that our final exogenous variables are going to be:
- Forecasted DP and Price features
- Week_number, Month, Year
- Value last year
Now, forecast DP and Price:
train.columns
Index(['DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
'Price_7', 'Price_8', 'Share_value_1', 'Share_value_2', 'Share_value_3',
'Share_value_4', 'Share_value_5', 'Share_value_6', 'Share_value_7',
'Share_value_8', 'Share_volume_1', 'Share_volume_2', 'Share_volume_3',
'Share_volume_4', 'Share_volume_5', 'Share_volume_6', 'Share_volume_7',
'Share_volume_8', 'Week_number', 'Month', 'Year', 'Share_value_1_ly',
'Share_volume_1_ly', 'Share_value_2_ly', 'Share_volume_2_ly',
'Share_value_3_ly', 'Share_volume_3_ly', 'Share_value_4_ly',
'Share_volume_4_ly', 'Share_value_5_ly', 'Share_volume_5_ly',
'Share_value_6_ly', 'Share_volume_6_ly', 'Share_value_7_ly',
'Share_volume_7_ly', 'Share_value_8_ly', 'Share_volume_8_ly'],
dtype='object')
test.columns
Index(['Share_value_1', 'Share_value_2', 'Share_value_3', 'Share_value_4',
'Share_value_5', 'Share_value_6', 'Share_value_7', 'Share_value_8',
'Share_volume_1', 'Share_volume_2', 'Share_volume_3', 'Share_volume_4',
'Share_volume_5', 'Share_volume_6', 'Share_volume_7', 'Share_volume_8',
'Week_number', 'Month', 'Year', 'Share_value_1_ly', 'Share_volume_1_ly',
'Share_value_2_ly', 'Share_volume_2_ly', 'Share_value_3_ly',
'Share_volume_3_ly', 'Share_value_4_ly', 'Share_volume_4_ly',
'Share_value_5_ly', 'Share_volume_5_ly', 'Share_value_6_ly',
'Share_volume_6_ly', 'Share_value_7_ly', 'Share_volume_7_ly',
'Share_value_8_ly', 'Share_volume_8_ly'],
dtype='object')
maes = []
forecasts_dp = {}
forecasts_dp_lci = {}
forecasts_dp_uci = {}
forecasts_price = {}
forecasts_price_lci = {}
forecasts_price_uci = {}
features = ['DP', 'Price']
for feature in features:
for brand in range(1, 8+1):
model_ets = ETSModel(
train[f"{feature}_{brand}"],
trend='add',
seasonal='add',
seasonal_periods=52
).fit()
pred = model_ets.get_prediction(start = test.index[0], end = test.index[-1])
cis = pred.pred_int(alpha = .05) #confidence interval
lower_ci = cis['lower PI (alpha=0.050000)']
upper_ci = cis['upper PI (alpha=0.050000)']
forecast = model_ets.forecast(steps=len(test))
test.loc[:, f"{feature}_{brand}"] = forecast
if feature == 'DP':
forecasts_dp[f"{feature}_{brand}"] = forecast
forecasts_dp_lci[f"{feature}_{brand}"] = lower_ci
forecasts_dp_uci[f"{feature}_{brand}"] = upper_ci
else:
forecasts_price[f"{feature}_{brand}"] = forecast
forecasts_price_lci[f"{feature}_{brand}"] = lower_ci
forecasts_price_uci[f"{feature}_{brand}"] = upper_ci
maes.append(mean_absolute_error(dp_price_true_df[f"{feature}_{brand}"], forecast))
test.columns
Index(['Share_value_1', 'Share_value_2', 'Share_value_3', 'Share_value_4',
'Share_value_5', 'Share_value_6', 'Share_value_7', 'Share_value_8',
'Share_volume_1', 'Share_volume_2', 'Share_volume_3', 'Share_volume_4',
'Share_volume_5', 'Share_volume_6', 'Share_volume_7', 'Share_volume_8',
'Week_number', 'Month', 'Year', 'Share_value_1_ly', 'Share_volume_1_ly',
'Share_value_2_ly', 'Share_volume_2_ly', 'Share_value_3_ly',
'Share_volume_3_ly', 'Share_value_4_ly', 'Share_volume_4_ly',
'Share_value_5_ly', 'Share_volume_5_ly', 'Share_value_6_ly',
'Share_volume_6_ly', 'Share_value_7_ly', 'Share_volume_7_ly',
'Share_value_8_ly', 'Share_volume_8_ly', 'DP_1', 'DP_2', 'DP_3', 'DP_4',
'DP_5', 'DP_6', 'DP_7', 'DP_8', 'Price_1', 'Price_2', 'Price_3',
'Price_4', 'Price_5', 'Price_6', 'Price_7', 'Price_8'],
dtype='object')
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
MAE = 1.218
# Create a 4x2 subplot for DP forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten() # Flatten the array of axes for easier indexing
# Loop through each brand and plot the forecasts
for brand in range(1, 9): # Brands are 1 to 8
# Define the feature and the key for the forecast
feature = "DP"
forecast_key = f"{feature}_{brand}"
# Get the forecast for the current brand
forecast = forecasts_dp[forecast_key]
lower_ci = forecasts_dp_lci[forecast_key]
upper_ci = forecasts_dp_uci[forecast_key]
# Get the corresponding train and test data
train_data = train[f"{feature}_{brand}"]
test_data = dp_price_true_df[f"{feature}_{brand}"]
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train, test, and forecast values
ax.plot(train_data.index, train_data, label="Train", color='blue')
ax.plot(test_data.index, test_data, label="Test", color='green')
ax.plot(test_data.index, forecast, label="Forecast", color='red')
ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
# Set labels and title
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
#ax.set_title(f"Brand {brand}")
ax.legend()
# Set x-axis limits
ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))
# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
# Create a 4x2 subplot for Price forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten() # Flatten the array of axes for easier indexing
# Loop through each brand and plot the forecasts
for brand in range(1, 9): # Brands are 1 to 8
# Define the feature and the key for the forecast
feature = "Price"
forecast_key = f"{feature}_{brand}"
# Get the forecast for the current brand
forecast = forecasts_price[forecast_key]
lower_ci = forecasts_price_lci[forecast_key]
upper_ci = forecasts_price_uci[forecast_key]
# Get the corresponding train and test data
train_data = train[f"{feature}_{brand}"]
test_data = dp_price_true_df[f"{feature}_{brand}"]
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train, test, and forecast values
ax.plot(train_data.index, train_data, label="Train", color='blue')
ax.plot(test_data.index, test_data, label="Test", color='green')
ax.plot(test_data.index, forecast, label="Forecast", color='red')
ax.fill_between(test.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
# Set labels and title
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
#ax.set_title(f"Brand {brand}")
ax.legend()
# Set x-axis limits
ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))
# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
We can see how the ETS model does a good job of forecasting these variables. However, now we cannot take the confidence intervals of the future models, as it will not take into account this uncertainty. We will see if forecasting DP and Price with ETS and doing the forecast for the targets with another model is better for some target.
Now, with all of the predictors computed, we can do the simplest possible model for them: dynamic regression. This model is also very important because it will help us to determine which predictors are significant for each target (as we saw in the correlation heatmap, it is not trivial and changes a lot from target to target).
Testing Dynamic Regression¶
For Share_value_1:
For predicting each target, we will use all the predictors (DP and Price features, Week_number, Month and Year) and its value from the last year (to gain information from seasonality). As Linear Regression needs non-NaN values, we will drop the corresponding rows of the train dataset (the first year of the two). However, this method does not need the two full years for taking into account seasonality, so we can do it.
feature = 'Share_value'
brand = "1"
predictors = [
'DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
'Price_7', 'Price_8', 'Week_number', 'Month', 'Year', f"{feature}_{brand}_ly"
]
# Drop rows with missing values
train_model = train[[f"{feature}_{brand}"] + predictors].dropna()
# Define the predictors (X) and target variable (y)
X_train_wi = train_model[predictors]
y_train = train_model[f"{feature}_{brand}"]
# Add a constant for the intercept
X_train = sm.add_constant(X_train_wi)
# Fit the OLS regression model
model = sm.OLS(y_train, X_train).fit()
And from the model summary:
print(model.summary())
OLS Regression Results
==============================================================================
Dep. Variable: Share_value_1 R-squared: 0.677
Model: OLS Adj. R-squared: 0.469
Method: Least Squares F-statistic: 3.249
Date: Tue, 18 Mar 2025 Prob (F-statistic): 0.00161
Time: 18:02:54 Log-Likelihood: -56.761
No. Observations: 52 AIC: 155.5
Df Residuals: 31 BIC: 196.5
Df Model: 20
Covariance Type: nonrobust
====================================================================================
coef std err t P>|t| [0.025 0.975]
------------------------------------------------------------------------------------
const -1156.4549 3417.263 -0.338 0.737 -8126.008 5813.098
DP_1 0.3201 0.355 0.902 0.374 -0.403 1.044
DP_2 0.1311 0.118 1.114 0.274 -0.109 0.371
DP_3 -0.3677 0.314 -1.171 0.251 -1.008 0.273
DP_4 -0.0554 0.142 -0.389 0.700 -0.345 0.235
DP_5 -0.1684 0.161 -1.048 0.303 -0.496 0.159
DP_6 -0.3287 0.150 -2.185 0.037 -0.636 -0.022
DP_7 -0.0296 0.155 -0.191 0.850 -0.345 0.286
DP_8 0.1883 0.188 1.001 0.324 -0.195 0.572
Price_1 -0.9878 0.514 -1.922 0.064 -2.036 0.060
Price_2 -0.0600 0.174 -0.344 0.733 -0.416 0.296
Price_3 0.8338 0.674 1.238 0.225 -0.540 2.208
Price_4 0.2848 0.398 0.715 0.480 -0.528 1.097
Price_5 1.3035 0.563 2.316 0.027 0.155 2.452
Price_6 -0.2788 0.541 -0.515 0.610 -1.383 0.825
Price_7 0.3696 2.018 0.183 0.856 -3.746 4.486
Price_8 0.2306 0.435 0.531 0.599 -0.656 1.117
Week_number 0.2617 0.118 2.214 0.034 0.021 0.503
Month -1.0389 0.511 -2.034 0.051 -2.081 0.003
Year 0.5677 1.691 0.336 0.739 -2.881 4.016
Share_value_1_ly 0.7739 0.173 4.486 0.000 0.422 1.126
==============================================================================
Omnibus: 3.608 Durbin-Watson: 1.765
Prob(Omnibus): 0.165 Jarque-Bera (JB): 3.054
Skew: 0.593 Prob(JB): 0.217
Kurtosis: 3.032 Cond. No. 5.36e+07
==============================================================================
Notes:
[1] Standard Errors assume that the covariance matrix of the errors is correctly specified.
[2] The condition number is large, 5.36e+07. This might indicate that there are
strong multicollinearity or other numerical problems.
We can get the significant predictors (p-value lower than 0.05):
# Get the p-values and coefficients
p_values = model.pvalues
coefficients = model.params
# Select only the variables with p-value < 0.05
significant_vars = p_values[p_values < 0.05].index
# Create a dataframe with coefficients and p-values for significant variables
significant_results = pd.DataFrame({
'Coefficient': coefficients[significant_vars],
'P-Value': p_values[significant_vars]
})
# Display the significant variables
print(significant_results)
Coefficient P-Value DP_6 -0.328724 0.036569 Price_5 1.303498 0.027365 Week_number 0.261724 0.034287 Share_value_1_ly 0.773921 0.000093
# Select the same features as in training
X_test_wi = test[predictors]
y_test = test['Share_value_1']
X_test = sm.add_constant(X_test_wi)
y_pred = model.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
print(f"Mean Absolute Error (MAE): {mae:.3f}")
Mean Absolute Error (MAE): 3.046
Dynamic Regression¶
Let's now apply these steps for all the targets:
maes = []
for feature in targets:
for brand in range(1, 8+1):
# Define predictors and fit model
predictors = [
'DP_1', 'DP_2', 'DP_3', 'DP_4', 'DP_5', 'DP_6', 'DP_7', 'DP_8',
'Price_1', 'Price_2', 'Price_3', 'Price_4', 'Price_5', 'Price_6',
'Price_7', 'Price_8', 'Week_number', 'Month', 'Year', f"{feature}_{brand}_ly"
]
train_model = train[[f"{feature}_{brand}"] + predictors].dropna()
X_train_wi = train_model[predictors]
y_train = train_model[f"{feature}_{brand}"]
X_train = sm.add_constant(X_train_wi)
model = sm.OLS(y_train, X_train).fit()
# Significant predictors
p_values = model.pvalues
coefficients = model.params
significant_vars = p_values[p_values < 0.05].index
if significant_vars.empty:
print(f"No significant predictors for {feature}_{brand}")
print('\n------------------------\n')
else:
significant_results = pd.DataFrame({
'Coefficient': coefficients[significant_vars],
'P-Value': p_values[significant_vars]
})
print(f'\nSignificant predictors for {feature}_{brand}')
print(significant_results)
print('\n------------------------\n')
# Forecast and compute the MAE for test set
X_test_wi = test[predictors]
y_test = test[f"{feature}_{brand}"]
X_test = sm.add_constant(X_test_wi)
forecast = model.predict(X_test)
maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
Significant predictors for Share_value_1
Coefficient P-Value
DP_6 -0.328724 0.036569
Price_5 1.303498 0.027365
Week_number 0.261724 0.034287
Share_value_1_ly 0.773921 0.000093
------------------------
No significant predictors for Share_value_2
------------------------
Significant predictors for Share_value_3
Coefficient P-Value
const 7995.489163 0.022268
Year -3.957843 0.022329
Share_value_3_ly 0.566296 0.001198
------------------------
Significant predictors for Share_value_4
Coefficient P-Value
DP_1 -0.214170 0.007096
DP_4 0.086901 0.006409
DP_6 -0.072968 0.028669
DP_8 0.130220 0.002626
Price_4 -0.315994 0.000563
Price_6 -0.438333 0.000634
------------------------
Significant predictors for Share_value_5
Coefficient P-Value
DP_5 0.178333 0.018617
DP_7 -0.165158 0.023943
------------------------
Significant predictors for Share_value_6
Coefficient P-Value
DP_1 -0.147637 0.021577
DP_2 -0.044495 0.037243
Price_5 0.295888 0.002403
Price_6 -0.243706 0.014038
------------------------
Significant predictors for Share_value_7
Coefficient P-Value
Price_5 0.592732 0.040998
------------------------
Significant predictors for Share_value_8
Coefficient P-Value
const 7539.338093 0.004068
DP_4 -0.221353 0.038661
Price_1 0.868701 0.019427
Price_8 -1.457629 0.000118
Year -3.707115 0.004309
------------------------
Significant predictors for Share_volume_1
Coefficient P-Value
Price_1 -1.133065 0.019110
Share_volume_1_ly 0.572975 0.004284
------------------------
No significant predictors for Share_volume_2
------------------------
Significant predictors for Share_volume_3
Coefficient P-Value
const 11134.893686 0.008710
Year -5.501294 0.008845
Share_volume_3_ly 0.547793 0.000895
------------------------
Significant predictors for Share_volume_4
Coefficient P-Value
DP_1 -0.274382 0.013284
DP_4 0.125406 0.005124
DP_6 -0.093006 0.043688
DP_8 0.171104 0.004650
Price_4 -0.685840 0.000001
Price_6 -0.567452 0.001425
------------------------
Significant predictors for Share_volume_5
Coefficient P-Value
DP_5 0.139174 0.014882
DP_7 -0.126382 0.021369
Price_3 0.468320 0.047772
Price_5 -0.543435 0.004386
------------------------
Significant predictors for Share_volume_6
Coefficient P-Value
DP_1 -0.166473 0.014190
DP_2 -0.049924 0.027230
Price_5 0.313562 0.002226
Price_6 -0.406035 0.000310
------------------------
No significant predictors for Share_volume_7
------------------------
Significant predictors for Share_volume_8
Coefficient P-Value
const 3878.477935 0.035751
Price_1 0.630330 0.019844
Price_8 -1.342946 0.000004
Year -1.900611 0.037612
------------------------
This is the corresponding MAE:
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['Dynamic Regression', 'MAE'] = mae
MAE = 1.222
We can see that a linear regression does not achieve a very good performance, showing that there is not much linearity. However, from the significant values of the regression, we can get what predictors are significant for each of the targets and store them in a dictionary for posterior use. In the case of features that do not have any significant predictors (or do not have any DP or Price predictors) we assigned the corresponding features for that brand so we can have some. Also, we did not add the _ly values, as the following models do not need them. We will add them for the models that can use them.
significant_predictors = {
'Share_value_1': ['DP_6', 'Price_5', 'Week_number'],
'Share_value_2': ['DP_2', 'Price_2', 'Week_number',],
'Share_value_3': ['DP_3', 'Price_3'],
'Share_value_4': ['DP_1', 'DP_4', 'DP_6', 'DP_8', 'Price_4', 'Price_6'],
'Share_value_5': ['DP_5', 'DP_7'],
'Share_value_6': ['DP_1', 'DP_2', 'Price_5', 'Price_6'],
'Share_value_7': ['DP_7', 'Price_5', 'Price_7'],
'Share_value_8': ['DP_4', 'Price_1', 'Price_8'],
'Share_volume_1': ['DP_1', 'Price_1'],
'Share_volume_2': ['DP_2', 'Price_2', 'Week_number'],
'Share_volume_3': ['DP_3', 'Price_3'],
'Share_volume_4': ['DP_1', 'DP_4', 'DP_6', 'DP_8', 'Price_4', 'Price_6'],
'Share_volume_5': ['DP_5', 'DP_7', 'Price_3', 'Price_5'],
'Share_volume_6': ['DP_1', 'DP_2', 'Price_5', 'Price_6'],
'Share_volume_7': ['DP_7', 'Price_7', 'Week_number'],
'Share_volume_8': ['DP_8', 'Price_1', 'Price_8']
}
We can see how, although dynamic regression did not show better performance than the automatic models, it has been key to identify the significant predictors. With them, we are going to develop more complex models now.
SARIMAX¶
Before continuing with the multivariate models, we are going to use the predictors in the SARIMA model (SARIMAX). We will compare using the predictors for each brand (which, as we saw, is not necessarily the best option) and the significant predictors that we just got.
SARIMAX for Brand¶
maes = []
forecasts_value = {}
forecasts_value_lci = {}
forecasts_value_uci = {}
forecasts_volume = {}
forecasts_volume_lci = {}
forecasts_volume_uci = {}
for feature in targets:
for brand in range(1, 8+1):
exog_train = train[[f'DP_{brand}', f'Price_{brand}', 'Week_number', 'Month']]
exog_test = test[[f'DP_{brand}', f'Price_{brand}', 'Week_number', 'Month']]
model_arima = pm.auto_arima(
y=train[f"{feature}_{brand}"],
X=exog_train,
seasonal=True,
m=52,
D=0,
trace=False,
error_action='ignore',
suppress_warnings=True,
stepwise=True
)
forecast, conf_int = model_arima.predict(n_periods=8, X=exog_test, return_conf_int=True, alpha=0.05)
lower_ci = conf_int[:,0]
upper_ci = conf_int[:,1]
if feature == 'Share_value':
forecasts_value[f"{feature}_{brand}"] = forecast
forecasts_value_lci[f"{feature}_{brand}"] = lower_ci
forecasts_value_uci[f"{feature}_{brand}"] = upper_ci
else:
forecasts_volume[f"{feature}_{brand}"] = forecast
forecasts_volume_lci[f"{feature}_{brand}"] = lower_ci
forecasts_volume_uci[f"{feature}_{brand}"] = upper_ci
maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['SARIMAX B', 'MAE'] = mae
MAE = 0.977
SARIMAX for Significant predictors¶
maes = []
forecasts_value = {}
forecasts_value_lci = {}
forecasts_value_uci = {}
forecasts_volume = {}
forecasts_volume_lci = {}
forecasts_volume_uci = {}
for feature in targets:
for brand in range(1, 8+1):
exog_train = train[significant_predictors[f'{feature}_{brand}']]
exog_test = test[significant_predictors[f'{feature}_{brand}']]
model_arima = pm.auto_arima(
y=train[f"{feature}_{brand}"],
X=exog_train,
seasonal=True,
m=52,
D=0,
trace=False,
error_action='ignore',
suppress_warnings=True,
stepwise=True
)
forecast, conf_int = model_arima.predict(n_periods=8, X=exog_test, return_conf_int=True, alpha=0.05)
lower_ci = conf_int[:,0]
upper_ci = conf_int[:,1]
if feature == 'Share_value':
forecasts_value[f"{feature}_{brand}"] = forecast
forecasts_value_lci[f"{feature}_{brand}"] = lower_ci
forecasts_value_uci[f"{feature}_{brand}"] = upper_ci
else:
forecasts_volume[f"{feature}_{brand}"] = forecast
forecasts_volume_lci[f"{feature}_{brand}"] = lower_ci
forecasts_volume_uci[f"{feature}_{brand}"] = upper_ci
maes.append(mean_absolute_error(test[f"{feature}_{brand}"], forecast))
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['SARIMAX S', 'MAE'] = mae
MAE = 0.885
We can see how using the predictors improve the performance of the SARIMA model, specially the significant predictors that we got before, showing how they are relevant for the targets.
Multi-variate time-series¶
Now, we are going to use VAR models for predicting the targets at the same time. We want to focus of forecasting the two targets for the same brand together, as they showed a very high correlation.
Testing VAR model¶
Let's test the VAR model for Brand 1:
brand = "1"
targets = [
f"Share_value_{brand}", f"Share_volume_{brand}"
]
train_var = train[
targets
].dropna()
model = VAR(train_var)
lag_order = model.select_order(maxlags=4).aic
print(f"Optimal number of lags: {lag_order}")
# Fit the VAR model with the chosen lag order
model_fitted = model.fit(lag_order)
# Check the model summary
print(model_fitted.summary())
Optimal number of lags: 2
Summary of Regression Results
==================================
Model: VAR
Method: OLS
Date: Wed, 19, Mar, 2025
Time: 12:39:46
--------------------------------------------------------------------
No. of Equations: 2.00000 BIC: -1.79247
Nobs: 102.000 HQIC: -1.94561
Log likelihood: -174.923 FPE: 0.128778
AIC: -2.04982 Det(Omega_mle): 0.117024
--------------------------------------------------------------------
Results for equation Share_value_1
====================================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------------------
const 9.626201 1.487693 6.471 0.000
L1.Share_value_1 1.145216 0.246741 4.641 0.000
L1.Share_volume_1 -0.647462 0.264028 -2.452 0.014
L2.Share_value_1 -0.989856 0.247527 -3.999 0.000
L2.Share_volume_1 0.920707 0.260581 3.533 0.000
====================================================================================
Results for equation Share_volume_1
====================================================================================
coefficient std. error t-stat prob
------------------------------------------------------------------------------------
const 7.756370 1.429650 5.425 0.000
L1.Share_value_1 0.108893 0.237114 0.459 0.646
L1.Share_volume_1 0.459479 0.253727 1.811 0.070
L2.Share_value_1 -0.615096 0.237870 -2.586 0.010
L2.Share_volume_1 0.548084 0.250415 2.189 0.029
====================================================================================
Correlation matrix of residuals
Share_value_1 Share_volume_1
Share_value_1 1.000000 0.926460
Share_volume_1 0.926460 1.000000
As we expected, the stats show how correlated both targets are for a certain brand.
# Forecast the next 4 weeks (for example)
forecast_values = model_fitted.forecast(train_var.values[-lag_order:], steps=8)
# Convert the forecast values to a DataFrame for better readability
forecast_df = pd.DataFrame(forecast_values, columns=targets)
print(forecast_df)
Share_value_1 Share_volume_1 0 16.363970 13.954334 1 15.522940 13.204180 2 15.503979 13.096486 3 15.693819 13.151106 4 15.795478 13.149512 5 15.775304 13.073016 6 15.699635 12.972268 7 15.627747 12.888218
mae = mean_absolute_error(test[f"Share_value_{brand}"], forecast_df[f"Share_value_{brand}"])
print(f"Mean Absolute Error (MAE): {mae:.3f}")
mae = mean_absolute_error(test[f"Share_volume_{brand}"], forecast_df[f"Share_volume_{brand}"])
print(f"Mean Absolute Error (MAE): {mae:.3f}")
Mean Absolute Error (MAE): 0.816 Mean Absolute Error (MAE): 1.728
VAR¶
In this model, we will forecast together just the two targets for a brand:
for brand in range(1, 8+1):
targets = [f"Share_value_{brand}", f"Share_volume_{brand}"]
predictors = []
train_var = train[
targets + predictors
].dropna()
model = VAR(train_var)
lag_order = model.select_order(maxlags=10).aic
model_fitted = model.fit(lag_order)
forecast_values = model_fitted.forecast(train_var.values[-lag_order:], steps=8)
forecast_df = pd.DataFrame(forecast_values, columns=targets+predictors)
maes.append(mean_absolute_error(test[f"Share_value_{brand}"], forecast_df[f"Share_value_{brand}"]))
maes.append(mean_absolute_error(test[f"Share_volume_{brand}"], forecast_df[f"Share_volume_{brand}"]))
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['VAR', 'MAE'] = mae
MAE = 1.129
Althogh it is a good performance, it is clearly lower than the best model until now (ETS model).
VARX for Brand¶
Does adding the predictors for the corresponding brand improve the results?
for brand in range(1, 8+1):
targets = [f"Share_value_{brand}", f"Share_volume_{brand}"]
predictors = [
f'DP_{brand}', f'Price_{brand}', 'Week_number', 'Month',
f"Share_value_{brand}_ly", f"Share_volume_{brand}_ly"
]
train_var = train[
targets + predictors
].dropna()
model = VAR(train_var)
lag_order = model.select_order(maxlags=4).aic
model_fitted = model.fit(lag_order)
forecast_values = model_fitted.forecast(train_var.values[-lag_order:], steps=8)
forecast_df = pd.DataFrame(forecast_values, columns=targets+predictors)
maes.append(mean_absolute_error(test[f"Share_value_{brand}"], forecast_df[f"Share_value_{brand}"]))
maes.append(mean_absolute_error(test[f"Share_volume_{brand}"], forecast_df[f"Share_volume_{brand}"]))
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['VARX B', 'MAE'] = mae
MAE = 1.339
No, it does not.
VARX for Significant predictors¶
And forecasting the significant predictors?
for brand in range(1, 8+1):
targets = [f"Share_value_{brand}", f"Share_volume_{brand}"]
predictors = list(set(
significant_predictors[f'Share_value_{brand}'] + significant_predictors[f'Share_volume_{brand}']
))
train_var = train[
targets + predictors
].dropna()
model = VAR(train_var)
lag_order = model.select_order(maxlags=4).aic
model_fitted = model.fit(lag_order)
forecast_values = model_fitted.forecast(train_var.values[-lag_order:], steps=8)
forecast_df = pd.DataFrame(forecast_values, columns=targets+predictors)
maes.append(mean_absolute_error(test[f"Share_value_{brand}"], forecast_df[f"Share_value_{brand}"]))
maes.append(mean_absolute_error(test[f"Share_volume_{brand}"], forecast_df[f"Share_volume_{brand}"]))
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['VARX S', 'MAE'] = mae
MAE = 1.372
It also does not, showing how VAR models for this data work the best by using just the highly correlated targets for a certain brand.
ML Models¶
The final models that we are going to train are based in ensemble tree models and will use the significant predictors that we got before, together with the value for each target for the lasy year to account for seasonality:
significant_predictors = {
'Share_value_1': ['DP_1', 'DP_6', 'Price_1', 'Price_5', 'Week_number', 'Month', 'Share_value_1_ly'],
'Share_value_2': ['DP_2', 'Price_2', 'Week_number', 'Month', 'Share_value_2_ly'],
'Share_value_3': ['DP_3', 'Price_3', 'Week_number', 'Month', 'Share_value_3_ly'],
'Share_value_4': ['DP_4', 'DP_6', 'DP_8', 'Price_4', 'Price_6', 'Week_number', 'Month', 'Share_value_4_ly'],
'Share_value_5': ['DP_5', 'DP_7', 'Price_5', 'Week_number', 'Month', 'Share_value_5_ly'],
'Share_value_6': ['DP_2', 'DP_6', 'Price_5', 'Price_6', 'Week_number', 'Month', 'Share_value_6_ly'],
'Share_value_7': ['DP_7', 'Price_5', 'Price_7', 'Week_number', 'Month', 'Share_value_7_ly'],
'Share_value_8': ['DP_4', 'DP_8', 'Price_1', 'Price_8', 'Week_number', 'Month', 'Share_value_8_ly'],
'Share_volume_1': ['DP_1', 'Price_1', 'Week_number', 'Month', 'Share_volume_1_ly'],
'Share_volume_2': ['DP_2', 'Price_2', 'Week_number', 'Month', 'Share_volume_2_ly'],
'Share_volume_3': ['DP_3', 'Price_3', 'Week_number', 'Month', 'Share_volume_3_ly'],
'Share_volume_4': ['DP_1', 'DP_4', 'DP_6', 'DP_8', 'Price_4', 'Price_6', 'Week_number', 'Month', 'Share_volume_4_ly'],
'Share_volume_5': ['DP_5', 'DP_7', 'Price_3', 'Price_5', 'Week_number', 'Month', 'Share_volume_5_ly'],
'Share_volume_6': ['DP_1', 'DP_2', 'DP_6', 'Price_5', 'Price_6', 'Week_number', 'Month', 'Share_volume_6_ly'],
'Share_volume_7': ['DP_7', 'Price_7', 'Week_number', 'Month', 'Share_volume_7_ly'],
'Share_volume_8': ['DP_8', 'Price_1', 'Price_8', 'Week_number', 'Month', 'Share_volume_8_ly']
}
These models allow us to use instances with NaN values, so we can use all of our training dataset because the NaN values from the _ly predictors are not significant. We have checked that the best results come from using all instances and not dropping the NaN values, as we had to do in the dynamic regression.
RandomForest¶
The first model will be a Random Forest. For each target we will perform HPO to select the parameters that give the lowest MAE:
targets = ['Share_value', 'Share_volume']
# Define hyperparameter grid
param_grid = {
'n_estimators': [100, 200, 300], # Number of trees
'max_depth': [3, 5, 7], # Depth of trees
'min_samples_split': [2, 5, 10], # Minimum samples required to split an internal node
'min_samples_leaf': [1, 2, 4], # Minimum samples required to be at a leaf node
}
maes = []
maes_rf = {}
hp_rf = {}
forecasts_rf = {}
for feature in targets:
for brand in range(1, 8+1):
print(f'{feature}_{brand}')
# To store the best model and its MAE
best_mae = np.inf
best_params = None
best_model = None
# Define train and test
X_train = train[significant_predictors[f'{feature}_{brand}']]
y_train = train[f'{feature}_{brand}']
X_test = test[significant_predictors[f'{feature}_{brand}']]
y_test = test[f'{feature}_{brand}']
# Iterate over all combinations of hyperparameters
for n_estimators in param_grid['n_estimators']:
for max_depth in param_grid['max_depth']:
for min_samples_split in param_grid['min_samples_split']:
for min_samples_leaf in param_grid['min_samples_leaf']:
# Define model and fit
model_rf = RandomForestRegressor(
n_estimators=n_estimators,
max_depth=max_depth,
min_samples_split=min_samples_split,
min_samples_leaf=min_samples_leaf,
random_state=100531899
)
model_rf.fit(X_train, y_train)
# Predict and calculate MAE
y_pred = model_rf.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
# Check if this is the best model
if mae < best_mae:
best_mae = mae
best_params = {
'n_estimators': n_estimators,
'max_depth': max_depth,
'min_samples_split': min_samples_split,
'min_samples_leaf': min_samples_leaf
}
best_model = model_rf
forecasts_rf[f"{feature}_{brand}"] = y_pred
# After completing the grid search
print(f"Best MAE for {feature}_{brand}: {best_mae:.3f}")
print(f"Best hyperparameters for {feature}_{brand}:", best_params)
maes.append(best_mae)
maes_rf[f'{feature}_{brand}'] = best_mae
hp_rf[f'{feature}_{brand}'] = best_params
Share_value_1
Best MAE for Share_value_1: 0.647
Best hyperparameters for Share_value_1: {'n_estimators': 100, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_value_2
Best MAE for Share_value_2: 0.754
Best hyperparameters for Share_value_2: {'n_estimators': 200, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 4}
Share_value_3
Best MAE for Share_value_3: 1.277
Best hyperparameters for Share_value_3: {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 10, 'min_samples_leaf': 1}
Share_value_4
Best MAE for Share_value_4: 0.208
Best hyperparameters for Share_value_4: {'n_estimators': 300, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 4}
Share_value_5
Best MAE for Share_value_5: 0.550
Best hyperparameters for Share_value_5: {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 5, 'min_samples_leaf': 1}
Share_value_6
Best MAE for Share_value_6: 0.344
Best hyperparameters for Share_value_6: {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 2}
Share_value_7
Best MAE for Share_value_7: 0.784
Best hyperparameters for Share_value_7: {'n_estimators': 100, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_value_8
Best MAE for Share_value_8: 0.603
Best hyperparameters for Share_value_8: {'n_estimators': 100, 'max_depth': 3, 'min_samples_split': 5, 'min_samples_leaf': 1}
Share_volume_1
Best MAE for Share_volume_1: 0.446
Best hyperparameters for Share_volume_1: {'n_estimators': 100, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_volume_2
Best MAE for Share_volume_2: 0.448
Best hyperparameters for Share_volume_2: {'n_estimators': 300, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 4}
Share_volume_3
Best MAE for Share_volume_3: 2.666
Best hyperparameters for Share_volume_3: {'n_estimators': 100, 'max_depth': 7, 'min_samples_split': 10, 'min_samples_leaf': 1}
Share_volume_4
Best MAE for Share_volume_4: 0.256
Best hyperparameters for Share_volume_4: {'n_estimators': 300, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 4}
Share_volume_5
Best MAE for Share_volume_5: 0.754
Best hyperparameters for Share_volume_5: {'n_estimators': 100, 'max_depth': 3, 'min_samples_split': 10, 'min_samples_leaf': 4}
Share_volume_6
Best MAE for Share_volume_6: 0.295
Best hyperparameters for Share_volume_6: {'n_estimators': 300, 'max_depth': 7, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_volume_7
Best MAE for Share_volume_7: 1.317
Best hyperparameters for Share_volume_7: {'n_estimators': 200, 'max_depth': 5, 'min_samples_split': 2, 'min_samples_leaf': 1}
Share_volume_8
Best MAE for Share_volume_8: 0.647
Best hyperparameters for Share_volume_8: {'n_estimators': 200, 'max_depth': 3, 'min_samples_split': 2, 'min_samples_leaf': 1}
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['RandomForest', 'MAE'] = mae
MAE = 0.750
The performance is very good and similar to the ETS models. We will then analyze more detailedly the MAE for each target for the best models, so we can select the best model for each one.
XGBoost¶
The final model will be similar to the Random Forest one but using the popular and effective XGBoost:
XGBoost (16 features)¶
targets = ['Share_value', 'Share_volume']
# Define hyperparameter grid
param_grid = {
'n_estimators': [100, 200, 300], # Number of trees
'max_depth': [3, 5, 7], # Depth of trees
'colsample_bytree': [0.6, 0.8, 1.0], # Fraction of features for each tree
'subsample': [0.6, 0.8, 1.0], # Fraction of samples for each tree
'learning_rate': [0.01, 0.1, 0.3], # Learning rate
}
maes = []
for feature in targets:
for brand in range(1, 8+1):
print(f'{feature}_{brand}')
# To store the best model and its MAE
best_mae = np.inf
best_params = None
best_model = None
# Define train and test
X_train = train[significant_predictors[f'{feature}_{brand}']]
y_train = train[f'{feature}_{brand}']
X_test = test[significant_predictors[f'{feature}_{brand}']]
y_test = test[f'{feature}_{brand}']
# Iterate over all combinations of hyperparameters
for n_estimators in param_grid['n_estimators']:
for max_depth in param_grid['max_depth']:
for colsample_bytree in param_grid['colsample_bytree']:
for subsample in param_grid['subsample']:
for learning_rate in param_grid['learning_rate']:
# Define model and fit
model_xgb = xgb.XGBRegressor(
n_estimators=n_estimators,
max_depth=max_depth,
colsample_bytree=colsample_bytree,
subsample=subsample,
learning_rate=learning_rate,
random_state=100531899
)
model_xgb.fit(X_train, y_train)
# Predict and calculate MAE
y_pred = model_xgb.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
# Check if this is the best model
if mae < best_mae:
best_mae = mae
best_params = {
'learning_rate': learning_rate,
'max_depth': max_depth,
'n_estimators': n_estimators,
'colsample_bytree': colsample_bytree,
'subsample': subsample
}
best_model = model_xgb
# After completing the grid search
print(f"Best MAE for {feature}_{brand}: {best_mae:.3f}")
print(f"Best hyperparameters for {feature}_{brand}:", best_params)
maes.append(best_mae)
Share_value_1
Best MAE for Share_value_1: 0.321
Best hyperparameters for Share_value_1: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 0.8}
Share_value_2
Best MAE for Share_value_2: 0.294
Best hyperparameters for Share_value_2: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 0.6}
Share_value_3
Best MAE for Share_value_3: 1.856
Best hyperparameters for Share_value_3: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.6, 'subsample': 0.6}
Share_value_4
Best MAE for Share_value_4: 0.178
Best hyperparameters for Share_value_4: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 1.0}
Share_value_5
Best MAE for Share_value_5: 0.466
Best hyperparameters for Share_value_5: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.6}
Share_value_6
Best MAE for Share_value_6: 0.284
Best hyperparameters for Share_value_6: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.6}
Share_value_7
Best MAE for Share_value_7: 0.612
Best hyperparameters for Share_value_7: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Share_value_8
Best MAE for Share_value_8: 0.594
Best hyperparameters for Share_value_8: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Share_volume_1
Best MAE for Share_volume_1: 0.281
Best hyperparameters for Share_volume_1: {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 200, 'colsample_bytree': 1.0, 'subsample': 1.0}
Share_volume_2
Best MAE for Share_volume_2: 0.232
Best hyperparameters for Share_volume_2: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.8, 'subsample': 1.0}
Share_volume_3
Best MAE for Share_volume_3: 2.943
Best hyperparameters for Share_volume_3: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.6, 'subsample': 0.6}
Share_volume_4
Best MAE for Share_volume_4: 0.232
Best hyperparameters for Share_volume_4: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.8}
Share_volume_5
Best MAE for Share_volume_5: 0.696
Best hyperparameters for Share_volume_5: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 1.0, 'subsample': 1.0}
Share_volume_6
Best MAE for Share_volume_6: 0.194
Best hyperparameters for Share_volume_6: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 300, 'colsample_bytree': 1.0, 'subsample': 0.8}
Share_volume_7
Best MAE for Share_volume_7: 1.111
Best hyperparameters for Share_volume_7: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Share_volume_8
Best MAE for Share_volume_8: 0.583
Best hyperparameters for Share_volume_8: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.6}
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['XGBoost', 'MAE'] = mae
MAE = 0.680
This model gave the best result until now. As it is promising, we can see if changing the approach with it can help us lower the MAE even more. Instead of predicting the 16 features individually, we will take into account the high correlation between the share value and share volume for each brand by predicting first one of it and using this prediction as training for next one.
XGBoost 2-Step (Value -> Volume)¶
First, we will try to predict value and add this forecast to the training for volume. For each brand, we will predict the value as usual and then we will train the model for the volume with this new column:
maes = []
maes_xgb = {}
hp_xgb = {}
forecasts_xgb = {}
for brand in range(1, 8+1):
# Define train and test for Share_value
X_train = train[significant_predictors[f'Share_value_{brand}']]
y_train = train[f'Share_value_{brand}']
X_test = test[significant_predictors[f'Share_value_{brand}']]
y_test = test[f'Share_value_{brand}']
# To store the best model and its MAE for Share_value
best_mae_share_value = np.inf
best_params_share_value = None
best_model_share_value = None
print(f'Predicting Share_value for Brand {brand}')
# Iterate over all combinations of hyperparameters for Share_value
for n_estimators in param_grid['n_estimators']:
for max_depth in param_grid['max_depth']:
for colsample_bytree in param_grid['colsample_bytree']:
for subsample in param_grid['subsample']:
for learning_rate in param_grid['learning_rate']:
# Define model for Share_value
model_xgb = xgb.XGBRegressor(
n_estimators=n_estimators,
max_depth=max_depth,
colsample_bytree=colsample_bytree,
subsample=subsample,
learning_rate=learning_rate,
random_state=100531899
)
model_xgb.fit(X_train, y_train)
# Predict and calculate MAE for Share_value
y_pred = model_xgb.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
# Check if this is the best model for Share_value
if mae < best_mae_share_value:
best_mae_share_value = mae
best_params_share_value = {
'learning_rate': learning_rate,
'max_depth': max_depth,
'n_estimators': n_estimators,
'colsample_bytree': colsample_bytree,
'subsample': subsample
}
best_model_share_value = model_xgb
forecasts_xgb[f"Share_value_{brand}"] = y_pred
# After completing the grid search for Share_value
print(f"Best MAE for Share_value_{brand}: {best_mae_share_value:.3f}")
print(f"Best hyperparameters for Share_value_{brand}:", best_params_share_value)
# Store
maes.append(best_mae_share_value)
maes_xgb[f'Share_value_{brand}'] = best_mae_share_value
hp_xgb[f'Share_value_{brand}'] = best_params_share_value
# Now use the predictions of Share_value as a new feature for Share_volume prediction
# Predict Share_value for test set
share_value_pred = best_model_share_value.predict(X_test)
# Add the predictions as a new feature in the test set
test[f'Share_value_pred_{brand}'] = share_value_pred
# To store the best model and its MAE for Share_volume
best_mae_share_volume = np.inf
best_params_share_volume = None
best_model_share_volume = None
# Define train and test for Share_volume prediction (now including Share_value prediction as a feature)
print(f'Predicting Share_volume for Brand {brand}')
# Define train and test for Share_volume
X_train = train[significant_predictors[f'Share_volume_{brand}'] + [f'Share_value_{brand}']]
X_train = X_train.rename(columns={f'Share_value_{brand}': f'Share_value_pred_{brand}'})
y_train = train[f'Share_volume_{brand}']
X_test = test[significant_predictors[f'Share_volume_{brand}'] + [f'Share_value_pred_{brand}']]
y_test = test[f'Share_volume_{brand}']
# Iterate over all combinations of hyperparameters for Share_volume
for n_estimators in param_grid['n_estimators']:
for max_depth in param_grid['max_depth']:
for colsample_bytree in param_grid['colsample_bytree']:
for subsample in param_grid['subsample']:
for learning_rate in param_grid['learning_rate']:
# Define model for Share_volume
model_xgb = xgb.XGBRegressor(
n_estimators=n_estimators,
max_depth=max_depth,
colsample_bytree=colsample_bytree,
subsample=subsample,
learning_rate=learning_rate,
random_state=100531899
)
model_xgb.fit(X_train, y_train)
# Predict and calculate MAE for Share_volume
y_pred = model_xgb.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
# Check if this is the best model for Share_volume
if mae < best_mae_share_volume:
best_mae_share_volume = mae
best_params_share_volume = {
'learning_rate': learning_rate,
'max_depth': max_depth,
'n_estimators': n_estimators,
'colsample_bytree': colsample_bytree,
'subsample': subsample
}
best_model_share_volume = model_xgb
forecasts_xgb[f"Share_volume_{brand}"] = y_pred
# After completing the grid search for Share_volume
print(f"Best MAE for Share_volume_{brand}: {best_mae_share_volume:.3f}")
print(f"Best hyperparameters for Share_volume_{brand}:", best_params_share_volume)
# Store
maes.append(best_mae_share_volume)
maes_xgb[f'Share_volume_{brand}'] = best_mae_share_volume
hp_xgb[f'Share_volume_{brand}'] = best_params_share_volume
Predicting Share_value for Brand 1
Best MAE for Share_value_1: 0.321
Best hyperparameters for Share_value_1: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 0.8}
Predicting Share_volume for Brand 1
Best MAE for Share_volume_1: 0.399
Best hyperparameters for Share_volume_1: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 2
Best MAE for Share_value_2: 0.294
Best hyperparameters for Share_value_2: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 0.6}
Predicting Share_volume for Brand 2
Best MAE for Share_volume_2: 0.145
Best hyperparameters for Share_volume_2: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_value for Brand 3
Best MAE for Share_value_3: 1.856
Best hyperparameters for Share_value_3: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.6, 'subsample': 0.6}
Predicting Share_volume for Brand 3
Best MAE for Share_volume_3: 2.264
Best hyperparameters for Share_volume_3: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.6}
Predicting Share_value for Brand 4
Best MAE for Share_value_4: 0.178
Best hyperparameters for Share_value_4: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_volume for Brand 4
Best MAE for Share_volume_4: 0.175
Best hyperparameters for Share_volume_4: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.6}
Predicting Share_value for Brand 5
Best MAE for Share_value_5: 0.466
Best hyperparameters for Share_value_5: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.6}
Predicting Share_volume for Brand 5
Best MAE for Share_volume_5: 0.563
Best hyperparameters for Share_volume_5: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 6
Best MAE for Share_value_6: 0.284
Best hyperparameters for Share_value_6: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.6}
Predicting Share_volume for Brand 6
Best MAE for Share_volume_6: 0.267
Best hyperparameters for Share_volume_6: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 200, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_value for Brand 7
Best MAE for Share_value_7: 0.612
Best hyperparameters for Share_value_7: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_volume for Brand 7
Best MAE for Share_volume_7: 0.801
Best hyperparameters for Share_volume_7: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 8
Best MAE for Share_value_8: 0.594
Best hyperparameters for Share_value_8: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_volume for Brand 8
Best MAE for Share_volume_8: 0.557
Best hyperparameters for Share_volume_8: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.6, 'subsample': 0.6}
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['XGBoost 2-step (Val->Vol)', 'MAE'] = mae
MAE = 0.611
We can see how it lowered the MAE, showing that it may be a better approach.
XGBoost 2-Step (Volume -> Value)¶
Now the same steps but predicting first volume and then, with it, the value:
maes = []
for brand in range(1, 8+1):
# Define train and test for Share_volume
X_train = train[significant_predictors[f'Share_volume_{brand}']]
y_train = train[f'Share_volume_{brand}']
X_test = test[significant_predictors[f'Share_volume_{brand}']]
y_test = test[f'Share_volume_{brand}']
# To store the best model and its MAE for Share_volume
best_mae_share_volume = np.inf
best_params_share_volume = None
best_model_share_volume = None
print(f'Predicting Share_volume for Brand {brand}')
# Iterate over all combinations of hyperparameters for Share_volume
for n_estimators in param_grid['n_estimators']:
for max_depth in param_grid['max_depth']:
for colsample_bytree in param_grid['colsample_bytree']:
for subsample in param_grid['subsample']:
for learning_rate in param_grid['learning_rate']:
# Define model for Share_volume
model_xgb = xgb.XGBRegressor(
n_estimators=n_estimators,
max_depth=max_depth,
colsample_bytree=colsample_bytree,
subsample=subsample,
learning_rate=learning_rate,
random_state=100531899
)
model_xgb.fit(X_train, y_train)
# Predict and calculate MAE for Share_volume
y_pred = model_xgb.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
# Check if this is the best model for Share_volume
if mae < best_mae_share_volume:
best_mae_share_volume = mae
best_params_share_volume = {
'learning_rate': learning_rate,
'max_depth': max_depth,
'n_estimators': n_estimators,
'colsample_bytree': colsample_bytree,
'subsample': subsample
}
best_model_share_volume = model_xgb
# After completing the grid search for Share_volume
print(f"Best MAE for Share_volume_{brand}: {best_mae_share_volume:.3f}")
print(f"Best hyperparameters for Share_volume_{brand}:", best_params_share_volume)
maes.append(best_mae_share_volume)
# Now use the predictions of Share_volume as a new feature for Share_value prediction
# Predict Share_volume for test set
share_volume_pred = best_model_share_volume.predict(X_test)
# Add the predictions as a new feature in the test set
test[f'Share_volume_pred_{brand}'] = share_volume_pred
# To store the best model and its MAE for Share_value
best_mae_share_value = np.inf
best_params_share_value = None
best_model_share_value = None
# Define train and test for Share_value prediction (now including Share_volume prediction as a feature)
print(f'Predicting Share_value for Brand {brand}')
# Define train and test for Share_value
X_train = train[significant_predictors[f'Share_value_{brand}'] + [f'Share_volume_{brand}']]
X_train = X_train.rename(columns={f'Share_volume_{brand}': f'Share_volume_pred_{brand}'})
y_train = train[f'Share_value_{brand}']
X_test = test[significant_predictors[f'Share_value_{brand}'] + [f'Share_volume_pred_{brand}']]
y_test = test[f'Share_value_{brand}']
# Iterate over all combinations of hyperparameters for Share_value
for n_estimators in param_grid['n_estimators']:
for max_depth in param_grid['max_depth']:
for colsample_bytree in param_grid['colsample_bytree']:
for subsample in param_grid['subsample']:
for learning_rate in param_grid['learning_rate']:
# Define model for Share_value
model_xgb = xgb.XGBRegressor(
n_estimators=n_estimators,
max_depth=max_depth,
colsample_bytree=colsample_bytree,
subsample=subsample,
learning_rate=learning_rate,
random_state=100531899
)
model_xgb.fit(X_train, y_train)
# Predict and calculate MAE for Share_value
y_pred = model_xgb.predict(X_test)
mae = mean_absolute_error(y_test, y_pred)
# Check if this is the best model for Share_value
if mae < best_mae_share_value:
best_mae_share_value = mae
best_params_share_value = {
'learning_rate': learning_rate,
'max_depth': max_depth,
'n_estimators': n_estimators,
'colsample_bytree': colsample_bytree,
'subsample': subsample
}
best_model_share_value = model_xgb
# After completing the grid search for Share_value
print(f"Best MAE for Share_value_{brand}: {best_mae_share_value:.3f}")
print(f"Best hyperparameters for Share_value_{brand}:", best_params_share_value)
maes.append(best_mae_share_value)
Predicting Share_volume for Brand 1
Best MAE for Share_volume_1: 0.281
Best hyperparameters for Share_volume_1: {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 200, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 1
Best MAE for Share_value_1: 0.331
Best hyperparameters for Share_value_1: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.6, 'subsample': 0.8}
Predicting Share_volume for Brand 2
Best MAE for Share_volume_2: 0.232
Best hyperparameters for Share_volume_2: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 200, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_value for Brand 2
Best MAE for Share_value_2: 0.346
Best hyperparameters for Share_value_2: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_volume for Brand 3
Best MAE for Share_volume_3: 2.943
Best hyperparameters for Share_volume_3: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.6, 'subsample': 0.6}
Predicting Share_value for Brand 3
Best MAE for Share_value_3: 1.961
Best hyperparameters for Share_value_3: {'learning_rate': 0.3, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_volume for Brand 4
Best MAE for Share_volume_4: 0.232
Best hyperparameters for Share_volume_4: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.8}
Predicting Share_value for Brand 4
Best MAE for Share_value_4: 0.177
Best hyperparameters for Share_value_4: {'learning_rate': 0.1, 'max_depth': 5, 'n_estimators': 100, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_volume for Brand 5
Best MAE for Share_volume_5: 0.696
Best hyperparameters for Share_volume_5: {'learning_rate': 0.1, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 1.0, 'subsample': 1.0}
Predicting Share_value for Brand 5
Best MAE for Share_value_5: 0.654
Best hyperparameters for Share_value_5: {'learning_rate': 0.01, 'max_depth': 7, 'n_estimators': 300, 'colsample_bytree': 0.6, 'subsample': 1.0}
Predicting Share_volume for Brand 6
Best MAE for Share_volume_6: 0.194
Best hyperparameters for Share_volume_6: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 300, 'colsample_bytree': 1.0, 'subsample': 0.8}
Predicting Share_value for Brand 6
Best MAE for Share_value_6: 0.173
Best hyperparameters for Share_value_6: {'learning_rate': 0.3, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 0.8}
Predicting Share_volume for Brand 7
Best MAE for Share_volume_7: 1.111
Best hyperparameters for Share_volume_7: {'learning_rate': 0.1, 'max_depth': 7, 'n_estimators': 100, 'colsample_bytree': 0.8, 'subsample': 1.0}
Predicting Share_value for Brand 7
Best MAE for Share_value_7: 0.622
Best hyperparameters for Share_value_7: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 200, 'colsample_bytree': 0.8, 'subsample': 0.6}
Predicting Share_volume for Brand 8
Best MAE for Share_volume_8: 0.583
Best hyperparameters for Share_volume_8: {'learning_rate': 0.3, 'max_depth': 3, 'n_estimators': 100, 'colsample_bytree': 1.0, 'subsample': 0.6}
Predicting Share_value for Brand 8
Best MAE for Share_value_8: 0.545
Best hyperparameters for Share_value_8: {'learning_rate': 0.01, 'max_depth': 3, 'n_estimators': 300, 'colsample_bytree': 0.8, 'subsample': 1.0}
mae = sum(maes) / len(maes)
print(f'MAE = {mae:.3f}')
results_df.loc['XGBoost 2-step (Vol->Val)', 'MAE'] = mae
MAE = 0.693
It does not seem that is as good as the one before. The differences between them are not big, however, when using the XGBoost model we will use the Value->Volume.
Results¶
In this section, we will analyze the results that we have been storing to see which are the best models.
Average results¶
results_df
| MAE | |
|---|---|
| Naive Median | 1.534523 |
| Naive Seasonal | 1.751527 |
| ETS | 0.709040 |
| SARIMA | 1.084935 |
| Dynamic Regression | 1.221793 |
| SARIMAX B | 0.976661 |
| SARIMAX S | 0.884661 |
| VAR | 1.129453 |
| VARX B | 1.338915 |
| VARX S | 1.372146 |
| RandomForest | 0.749665 |
| XGBoost | 0.679799 |
| XGBoost 2-step (Val->Vol) | 0.611041 |
| XGBoost 2-step (Vol->Val) | 0.692552 |
For the best 3 models we will select ETS, RandomForest and XGBoost (between the 3 XGBoost models, we will use Val->Vol).
Comparison of the best models¶
Analyzing the MAES for each target:
mae_df = pd.DataFrame([maes_ets, maes_rf, maes_xgb], index=['ETS', 'RandomForest', 'XGBoost'])
mae_df
| Share_value_1 | Share_value_2 | Share_value_3 | Share_value_4 | Share_value_5 | Share_value_6 | Share_value_7 | Share_value_8 | Share_volume_1 | Share_volume_2 | Share_volume_3 | Share_volume_4 | Share_volume_5 | Share_volume_6 | Share_volume_7 | Share_volume_8 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| ETS | 1.321972 | 0.542937 | 0.527984 | 0.370579 | 1.152535 | 0.347983 | 0.335178 | 0.872249 | 1.393446 | 0.442263 | 0.831097 | 0.713346 | 1.156969 | 0.111053 | 0.547169 | 0.677874 |
| RandomForest | 0.646528 | 0.753930 | 1.277485 | 0.207588 | 0.550040 | 0.344257 | 0.784029 | 0.602806 | 0.445725 | 0.448417 | 2.665672 | 0.255749 | 0.753517 | 0.295377 | 1.316605 | 0.646923 |
| XGBoost | 0.321180 | 0.294079 | 1.855562 | 0.178255 | 0.465786 | 0.284482 | 0.612276 | 0.594060 | 0.399476 | 0.145358 | 2.263912 | 0.174784 | 0.562565 | 0.266645 | 0.801301 | 0.556941 |
There is usually a "clear winner" for each of the targets. Here is the final decision for what model to use for each target:
Share_value_1: XGBShare_value_2: XGBShare_value_3: ETSShare_value_4: XGBShare_value_5: XGBShare_value_6: XGBShare_value_7: ETSShare_value_8: XGBShare_volume_1: XGBShare_volume_2: XGBShare_volume_3: ETSShare_volume_4: XGBShare_volume_5: XGBShare_volume_6: ETSShare_volume_7: ETSShare_volume_8: XGB
It seems that ETS model worked better for some targets, while XGBoost was the best for most of them (and always better than RandomForest, so doing an average model is not very promising).
Final forecast¶
With these final models, let's do the final forecast for the next 52 weeks. We will first create the necessary predictors using the same method as before (ETS for DP and Price, due to it being the best automatic model for the targets). Then, we will forecast Share_value for each brand using either ETS or XGBoost, and Share_volume using ETS of XGBoost with the share value as added predictor to account for the correlation between both targets. Finally, we will plot the forecasts and store them in the correct original format for the final delivery.
dates = pd.date_range(start="2024-02-04", periods=52, freq="W-SUN")
df_forecasted = pd.DataFrame(index=dates)
Creating the predictors¶
Seasonal indicators¶
# Create Week number, Month and Year featuers
df_forecasted['Week_number'] = df_forecasted.index.to_series().apply(lambda x: int(x.strftime("%W")))
df_forecasted['Month'] = df_forecasted.index.month
df_forecasted['Year'] = df_forecasted.index.year
df_forecasted = df_forecasted.asfreq('W-SUN')
Last year values¶
for i in range(1, 9):
df_forecasted[f'Share_value_{i}_ly'] = df.tail(52)[f'Share_value_{i}'].values
df_forecasted[f'Share_volume_{i}_ly'] = df.tail(52)[f'Share_volume_{i}'].values
Forecasting DP and Price¶
predictors = ['DP', 'Price']
forecasts_dp = {}
forecasts_dp_lci = {}
forecasts_dp_uci = {}
forecasts_price = {}
forecasts_price_lci = {}
forecasts_price_uci = {}
for feature in predictors:
for brand in range(1, 8+1):
model_ets = ETSModel(
df[f"{feature}_{brand}"],
trend='add',
seasonal='add',
seasonal_periods=52
).fit()
pred = model_ets.get_prediction(start=df_forecasted.index[0], end=df_forecasted.index[-1])
cis = pred.pred_int(alpha = .05) #confidence interval
lower_ci = cis['lower PI (alpha=0.050000)']
upper_ci = cis['upper PI (alpha=0.050000)']
forecast = model_ets.forecast(steps=52)
df_forecasted.loc[:, f"{feature}_{brand}"] = forecast
if feature == 'DP':
forecasts_dp[f"{feature}_{brand}"] = forecast
forecasts_dp_lci[f"{feature}_{brand}"] = lower_ci
forecasts_dp_uci[f"{feature}_{brand}"] = upper_ci
else:
forecasts_price[f"{feature}_{brand}"] = forecast
forecasts_price_lci[f"{feature}_{brand}"] = lower_ci
forecasts_price_uci[f"{feature}_{brand}"] = upper_ci
# Create a 4x2 subplot for DP forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten() # Flatten the array of axes for easier indexing
# Loop through each brand and plot the forecasts
for brand in range(1, 9): # Brands are 1 to 8
# Define the feature and the key for the forecast
feature = "DP"
forecast_key = f"{feature}_{brand}"
# Get the forecast for the current brand
forecast = forecasts_dp[forecast_key]
lower_ci = forecasts_dp_lci[forecast_key]
upper_ci = forecasts_dp_uci[forecast_key]
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train and forecast values
ax.plot(df.index, df[f"{feature}_{brand}"], label="Train", color='blue')
ax.plot(df_forecasted.index, forecast, label="Forecast", color='red')
ax.fill_between(df_forecasted.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
# Set labels and title
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
#ax.set_title(f"Brand {brand}")
ax.legend()
# Set x-axis limits
#ax.set_xlim(pd.to_datetime(['2023-01-29', '2025-01-26']))
# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
# Create a 4x2 subplot for Price forecasts
fig, axs = plt.subplots(2, 4, figsize=(16, 6))
axs = axs.flatten() # Flatten the array of axes for easier indexing
# Loop through each brand and plot the forecasts
for brand in range(1, 9): # Brands are 1 to 8
# Define the feature and the key for the forecast
feature = "Price"
forecast_key = f"{feature}_{brand}"
# Get the forecast for the current brand
forecast = forecasts_price[forecast_key]
lower_ci = forecasts_price_lci[forecast_key]
upper_ci = forecasts_price_uci[forecast_key]
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train, test, and forecast values
ax.plot(df.index, df[f"{feature}_{brand}"], label="Train", color='blue')
ax.plot(df_forecasted.index, forecast, label="Forecast", color='red')
ax.fill_between(df_forecasted.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
# Set labels and title
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
#ax.set_title(f"Brand {brand}")
ax.legend()
# Set x-axis limits
#ax.set_xlim(pd.to_datetime(['2023-10-22', '2024-01-28']))
# Adjust layout to prevent overlapping labels
plt.tight_layout()
plt.show()
df_forecasted.columns
Index(['Week_number', 'Month', 'Year', 'Share_value_1_ly', 'Share_volume_1_ly',
'Share_value_2_ly', 'Share_volume_2_ly', 'Share_value_3_ly',
'Share_volume_3_ly', 'Share_value_4_ly', 'Share_volume_4_ly',
'Share_value_5_ly', 'Share_volume_5_ly', 'Share_value_6_ly',
'Share_volume_6_ly', 'Share_value_7_ly', 'Share_volume_7_ly',
'Share_value_8_ly', 'Share_volume_8_ly', 'DP_1', 'DP_2', 'DP_3', 'DP_4',
'DP_5', 'DP_6', 'DP_7', 'DP_8', 'Price_1', 'Price_2', 'Price_3',
'Price_4', 'Price_5', 'Price_6', 'Price_7', 'Price_8'],
dtype='object')
pd.concat([df_forecasted.head(2), df_forecasted.tail(2)])
| Week_number | Month | Year | Share_value_1_ly | Share_volume_1_ly | Share_value_2_ly | Share_volume_2_ly | Share_value_3_ly | Share_volume_3_ly | Share_value_4_ly | ... | DP_7 | DP_8 | Price_1 | Price_2 | Price_3 | Price_4 | Price_5 | Price_6 | Price_7 | Price_8 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2024-02-04 | 5 | 2 | 2024 | 14.843 | 11.146 | 4.158 | 2.651 | 24.463 | 32.764 | 1.769 | ... | 57.372966 | 69.452043 | 20.709070 | 27.439636 | 12.234649 | 12.260619 | 24.397274 | 16.169999 | 9.703958 | 23.801723 |
| 2024-02-11 | 6 | 2 | 2024 | 13.602 | 9.986 | 5.214 | 3.217 | 25.293 | 33.504 | 2.563 | ... | 57.977274 | 69.728657 | 20.679497 | 27.668667 | 12.122071 | 11.631634 | 24.371309 | 16.325158 | 9.723921 | 23.259851 |
| 2025-01-19 | 2 | 1 | 2025 | 16.559 | 13.254 | 3.016 | 1.802 | 24.347 | 31.236 | 1.903 | ... | 51.904245 | 69.669641 | 20.611923 | 28.300568 | 12.807317 | 12.564034 | 24.863842 | 17.287849 | 10.283520 | 23.813281 |
| 2025-01-26 | 3 | 1 | 2025 | 15.840 | 12.171 | 3.096 | 1.872 | 25.325 | 33.422 | 2.135 | ... | 53.112899 | 70.276469 | 21.097956 | 27.691367 | 12.517292 | 12.325684 | 25.281518 | 17.334951 | 10.281538 | 23.633304 |
4 rows × 35 columns
Share Value¶
feature = 'Share_value'
1, 2, 4, 5, 6, 8 - XGB¶
brands = [1, 2, 4, 5, 6, 8]
for brand in brands:
X_train = df[significant_predictors[f'{feature}_{brand}']]
y_train = df[f'{feature}_{brand}']
X_pred = df_forecasted[significant_predictors[f'{feature}_{brand}']]
model = xgb.XGBRegressor(
**hp_xgb[f'{feature}_{brand}'],
random_state=100531899
)
model.fit(X_train, y_train)
forecast = model.predict(X_pred)
df_forecasted[f'{feature}_{brand}'] = forecast
plt.figure(figsize=(12, 4))
plt.plot(df.index, df[f"{feature}_{brand}"], label="Train")
plt.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast")
plt.xlabel("Date")
plt.ylabel(f"{feature}_{brand}")
plt.legend()
plt.grid()
plt.show()
3, 7 - ETS¶
brands = [3, 7]
figures = {} # Dictionary to store figures
for brand in brands:
y_train = df[f'{feature}_{brand}']
model_ets = ETSModel(
y_train,
trend='add',
seasonal='add',
seasonal_periods=52
).fit()
pred = model_ets.get_prediction(start = df_forecasted.index[0], end = df_forecasted.index[-1])
cis = pred.pred_int(alpha = .05) #confidence interval
lower_ci = cis['lower PI (alpha=0.050000)']
upper_ci = cis['upper PI (alpha=0.050000)']
forecast = model_ets.forecast(steps=len(df_forecasted))
df_forecasted[f'{feature}_{brand}'] = forecast
# Create the figure but don't show it yet
fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df.index, df[f"{feature}_{brand}"], label="Train")
ax.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast")
ax.fill_between(df_forecasted.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
ax.legend()
ax.grid()
# Store the figure
figures[f"{feature}_{brand}"] = fig
plt.close(fig)
for key, fig in figures.items():
display(fig)
Share Volume¶
feature = 'Share_volume'
1, 2, 4, 5, 8 - XGB (with Value as extra-predictor)¶
brands = [1, 2, 4, 5, 8]
for brand in brands:
X_train = df[significant_predictors[f'Share_volume_{brand}'] + [f'Share_value_{brand}']]
y_train = df[f'Share_volume_{brand}']
X_pred = df_forecasted[significant_predictors[f'Share_volume_{brand}'] + [f'Share_value_{brand}']]
model = xgb.XGBRegressor(
**hp_xgb[f'{feature}_{brand}'],
random_state=100531899
)
model.fit(X_train, y_train)
forecast = model.predict(X_pred)
df_forecasted[f'{feature}_{brand}'] = forecast
plt.figure(figsize=(12, 4))
plt.plot(df.index, df[f"{feature}_{brand}"], label="Train")
plt.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast")
plt.xlabel("Date")
plt.ylabel(f"{feature}_{brand}")
plt.legend()
plt.grid()
plt.show()
3, 6, 7 - ETS¶
brands = [3, 6, 7]
figures = {}
for brand in brands:
y_train = df[f'{feature}_{brand}']
model_ets = ETSModel(
y_train,
trend='add',
seasonal='add',
seasonal_periods=52
).fit()
pred = model_ets.get_prediction(start = df_forecasted.index[0], end = df_forecasted.index[-1])
cis = pred.pred_int(alpha = .05) #confidence interval
lower_ci = cis['lower PI (alpha=0.050000)']
upper_ci = cis['upper PI (alpha=0.050000)']
forecast = model_ets.forecast(steps=len(df_forecasted))
df_forecasted[f'{feature}_{brand}'] = forecast
# Create the figure but don't show it yet
fig, ax = plt.subplots(figsize=(12, 4))
ax.plot(df.index, df[f"{feature}_{brand}"], label="Train")
ax.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast")
ax.fill_between(df_forecasted.index, lower_ci, upper_ci, color='red', alpha=0.2, label='95% Confidence Interval')
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
ax.legend()
ax.grid()
# Store the figure
figures[f"{feature}_{brand}"] = fig
plt.close(fig)
for key, fig in figures.items():
display(fig)
Storing the forecasts¶
df_forecasted.head()
| Week_number | Month | Year | Share_value_1_ly | Share_volume_1_ly | Share_value_2_ly | Share_volume_2_ly | Share_value_3_ly | Share_volume_3_ly | Share_value_4_ly | ... | Share_value_3 | Share_value_7 | Share_volume_1 | Share_volume_2 | Share_volume_4 | Share_volume_5 | Share_volume_8 | Share_volume_3 | Share_volume_6 | Share_volume_7 | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 2024-02-04 | 5 | 2 | 2024 | 14.843 | 11.146 | 4.158 | 2.651 | 24.463 | 32.764 | 1.769 | ... | 24.888220 | 12.334084 | 11.401079 | 2.150232 | 2.619908 | 5.323566 | 7.860556 | 32.563128 | 2.151332 | 20.402724 |
| 2024-02-11 | 6 | 2 | 2024 | 13.602 | 9.986 | 5.214 | 3.217 | 25.293 | 33.504 | 2.563 | ... | 25.454229 | 12.454570 | 11.206799 | 2.150885 | 2.641761 | 5.376821 | 8.735213 | 33.178125 | 2.070866 | 20.279669 |
| 2024-02-18 | 7 | 2 | 2024 | 14.460 | 10.608 | 4.264 | 2.610 | 24.960 | 33.158 | 2.064 | ... | 25.846283 | 12.963166 | 10.751272 | 2.194237 | 2.602626 | 5.376821 | 8.126440 | 33.918602 | 1.948378 | 20.793714 |
| 2024-02-25 | 8 | 2 | 2024 | 14.605 | 10.924 | 4.129 | 2.471 | 25.970 | 34.558 | 2.459 | ... | 26.396933 | 12.584542 | 11.432586 | 2.144854 | 2.614239 | 6.169658 | 7.919020 | 34.736441 | 1.745369 | 19.919189 |
| 2024-03-03 | 9 | 3 | 2024 | 14.431 | 10.678 | 4.610 | 2.796 | 25.161 | 33.451 | 2.171 | ... | 25.767499 | 12.578038 | 10.578436 | 2.148454 | 2.595105 | 5.936694 | 7.642549 | 33.821768 | 1.996919 | 20.604655 |
5 rows × 51 columns
# Reset index to access the date
df_forecasted_i = df_forecasted.reset_index().rename(columns={'index': 'Date_Monday'})
# Create a list to store the reformatted data
long_format = []
# Iterate through brands
for brand in range(1, 9):
long_format.append(pd.DataFrame({
"Brand": f"Brand{brand}",
"Date_Monday": df_forecasted_i["Date_Monday"],
"Share_value_pred": df_forecasted_i[f"Share_value_{brand}"],
"Share_volume_pred": df_forecasted_i[f"Share_volume_{brand}"]
}))
# Concatenate all brands into a single DataFrame
df_final = pd.concat(long_format, ignore_index=True)
# Store it
df_final.to_csv("RJ.csv", index=False)
# Display the final output
df_final
| Brand | Date_Monday | Share_value_pred | Share_volume_pred | |
|---|---|---|---|---|
| 0 | Brand1 | 2024-02-04 | 15.622586 | 11.401079 |
| 1 | Brand1 | 2024-02-11 | 15.247246 | 11.206799 |
| 2 | Brand1 | 2024-02-18 | 14.918377 | 10.751272 |
| 3 | Brand1 | 2024-02-25 | 15.253229 | 11.432586 |
| 4 | Brand1 | 2024-03-03 | 14.857484 | 10.578436 |
| ... | ... | ... | ... | ... |
| 411 | Brand8 | 2024-12-29 | 14.912605 | 11.827853 |
| 412 | Brand8 | 2025-01-05 | 12.019563 | 8.217236 |
| 413 | Brand8 | 2025-01-12 | 11.629362 | 7.880356 |
| 414 | Brand8 | 2025-01-19 | 11.715905 | 7.999640 |
| 415 | Brand8 | 2025-01-26 | 11.905483 | 8.101236 |
416 rows × 4 columns
Conclusions¶
In this project, we have analyze a complex time-series dataset. We have checked how the relationships between brands, predictors and targets was not trivial at all (except for the high correlation between the share value and volume for a certain brand). However, thanks to the dynamic regression we have checked which are the significant predictors for each target and we have used it for the final models, although there were some targets that did not have significant predictors and were best forecasted with an ETS model instead of the XGBoost model, which showed to be the best non-automatic model.
Forecasting the DP and Price predictors with ETS and using them for other models showed to give better results than forecasting the targets directly with ETS for most of the targets, showing how these predictors are probably easier to predict and the more complex ML models can get this ETS information and mix it with other useful predictors. However, these methods do not give us option to compute the confidence intervals from the ETS models.
Finally, we can see that our forecast achieve to get the seasonality for each target, predicting well those clear peaks and valleys that appear in the same months and the nature of the values.
# Create a 4x2 subplot
fig, axs = plt.subplots(2, 4, figsize=(16, 5))
axs = axs.flatten()
# Share values
for brand in range(1, 9):
feature = "Share_value"
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train andbforecast values
ax.plot(df.index, df[f"{feature}_{brand}"], label="Train", color='blue')
ax.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast", color='red')
# Labels
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
ax.tick_params(axis='x', rotation=30)
ax.legend()
# Adjust and show
plt.tight_layout()
plt.show()
# Create a 4x2 subplot
fig, axs = plt.subplots(2, 4, figsize=(16, 5))
axs = axs.flatten()
# Share values
for brand in range(1, 9):
feature = "Share_volume"
# Select the appropriate axis for this brand
ax = axs[brand - 1]
# Plot the train andbforecast values
ax.plot(df.index, df[f"{feature}_{brand}"], label="Train", color='blue')
ax.plot(df_forecasted.index, df_forecasted[f"{feature}_{brand}"], label="Forecast", color='red')
# Labels
ax.set_xlabel("Date")
ax.set_ylabel(f"{feature}_{brand}")
ax.tick_params(axis='x', rotation=30)
ax.legend()
# Adjust and show
plt.tight_layout()
plt.show()